Opened 13 years ago
Closed 13 years ago
#17668 closed Bug (fixed)
prefetch_related does not work in in_bulk
Reported by: | Owned by: | Luke Plant | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4-alpha-1 |
Severity: | Release blocker | Keywords: | prefetch_related, in_bulk |
Cc: | anssi.kaariainen@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
When using the in_bulk - prefetch_related does not work.
For example: in_bulk uses django-haystack. load data from its storage (solr, xapian..) and search objects in django db backend.
qs = Book.objects.all().prefetch_related('authors')
books = qs.in_bulk([1,2,3])
books at this time without prefetched authors.
Attachments (1)
Change History (11)
comment:1 by , 13 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 13 years ago
Ok, I have a patch for this. It is now an error to use .iterator() together with .prefetch_related(). in_bulk does not use the iterator any more. It first forces the query into a list.
by , 13 years ago
Attachment: | 17668.diff added |
---|
comment:3 by , 13 years ago
Severity: | Normal → Release blocker |
---|
Thank you both for the report and patch. Marking this ticket as release blocker since this is a limitation/bug in a new feature.
comment:5 by , 13 years ago
I'm not sure about this solution.
I agree that in_bulk does not need to use iterator() - it doesn't gain much by using it, since it immediately consumes the whole generator, it only avoids the creation of a lists. This part of the patch has a serious bug though - it should be list(qs)
not list(self)
. In fact it can be simplified to just: return dict([(obj._get_pk_val(), obj) for obj in qs])
- no need to create a list we will throw away.
However, making iterator() blow up if prefetch_related() is used is more dubious. If you have a 3rd party library, or just some generic code, that uses your model (but is agnostic to your model), and uses iterator()
, this patch will cause it to blow up if you used prefetch_related() on a default manager, for example. This patch effectively means that many instances of iterator() would need patching to be safe in the general case. I don't think people who use iterator() shouldn't have to worry about clearing prefetch_related() first - that's a nasty little gotcha that will clutter people's code.
So at the moment I think this should be a documentation fix for iterator(). The note should probably go on both iterator() and prefetch_related() docs.
comment:6 by , 13 years ago
There are three possible choices for the iterator() + prefetch_related():
- Raise error.
- Favor prefetch: when iterator is used, fetch all the objects and related objects in one go, then yield them one at a time.
- Favor iterator: throw out the defined prefetch, and do the iterator properly. This is what is currently done.
I suggested option 1 already, but that wasn't favored. From the rest option 2 is in my opinion better than the currently used option 3. Just because the potential performance loss is less in that case. You would get the reduced memory usage given by .iterator() when it is possible to do that. If there is prefetches, they are done correctly, but you lose the memory efficiency of iterator(). For example, in_bulk would have worked perfectly without any changes to it with option 2.
Option 2 should be easy to implement, where the error is raised currently, do:
for obj in list(self): yield obj
I haven't actually tested this though...
Anyways, which option is chosen isn't that important. I could live with any of them.
comment:7 by , 13 years ago
Owner: | changed from | to
---|
I thought I had replied here, but obviously not. Thanks for setting this down clearly.
I favour option 3 for the following reasons:
iterator()
by definition always comes last, and therefore is always in the code that actually generates the query. It is confusing that your iterator() call gets effectively ignored because of something that got passed in to the code in front of you, over which you might have little control, and I think more confusing than the your prefetch_related()
getting ignored because of something that comes later. There are many ways in which QuerySet
calls can get ignored by something that comes later e.g. order_by()
always works this way, prefetch_related()
itself works this way if you do prefetch_related(None)
.
I also don't think that we can correctly judge the potential performance loss - there will almost certainly be cases which contradict our intuition. We should therefore just go on what is the most obvious API, rather than trying to second-guess the developer or the situation.
comment:8 by , 13 years ago
I am OK with the above.
Should there be a way to see if there are prefetches? It would be cheap to add a way to see the current prefetches, this would give a chance to skip the .iterator() call if there are prefetches defined.
Or maybe just document the current behavior, fix in_bulk and be done with it. If somebody complains, add a way to check the prefetches then. I have a feeling I have wasted enough everybody's time on this issue already...
comment:9 by , 13 years ago
Type: | Uncategorized → Bug |
---|
@akaariai: You haven't wasted anyone's time - your input is really appreciated!
I think we should probably wait until someone asks for a way to check for prefetches. I guess the whole QuerySet
API is a bit lacking in this regard - I don't think we have a public API for checking other things like select_related and order_by etc. Perhaps there would be an API that covers all of those e.g. get_query_options()
that returns a dictionary of things that have been set.
The problem is anything using the .iterator() of qs will not do the prefetching. This should be documented, as there is no possibility to both yield object at a time, and do prefetching. Maybe also an assert in the iterator about this would be warranted. Another option is to force the whole query to be iterated over and then doing the prefetch when calling .iterator() and using .prefetch_related() in the same query.
Marking as accepted. Maybe this should be a release blocker?