#26211 closed New feature (wontfix)
prefetch_related with Prefetch with queryset with explicit Ordering Ignored
Reported by: | Alex Rothberg | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Given these models:
class Parent(models.Model): pass class Child(models.Model): saved_dt = models.DateTimeField(auto_now_add=True) parent = models.ForeignKey(Parent)
I am executing this query (for each Parent
I want to get the child_set
in sorted order):
[m.child_set.order_by('saved_dt') for m in Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).all()]
I would expect that to be two SQL queries, but in reality I see N+2
where N
is the number of Child
s:
SELECT "prefetch_parent"."id" FROM "prefetch_parent"; args=() SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" IN (1, 2) ORDER BY "prefetch_child"."saved_dt" ASC; args=(1, 2) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 1 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(1,) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 2 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(2,)
Change History (8)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Good, that's what I was going suggest. I think we can close this as expected behavior.
comment:3 by , 9 years ago
Woah! I was not suggesting closing this! I don't think behavior is as expected. The ORM should realize the prefetch has already been done.
The ORM should perform the partition operation in memory rather than going back to the DB. In fact I assume it already does that. For some reason the request for sorting seems to break the use of the cached query results.
comment:4 by , 9 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Type: | Uncategorized → New feature |
I disagree -- enabling QuerySet.order_by()
to sometimes do operations in memory could be quite confusing. Also, it's impossible to replicate the databases implementation of ordering in Python. As the documentation notes, "Each order_by()
call will clear any previous ordering" (it returns a new QuerySet
). Anyway, this isn't my area of expertise so I'll leave it open for a second opinion.
comment:5 by , 9 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I'm also inclined to close this ticket as expected behavior.
The prefetch_related
method stores the prefetched objects on the instance it belongs to. As soon as you call a queryset method on the related manager holding the cached instances (child_set.order_by()
) a new queryset instance is returned which doesn't hold any reference to the previously cached results.
This is same as expecting a call to a queryset method with cached results to perform its operation in memory instead of going back to the db.
e.g.
children = Child.objects.all() list(children) # Will issue a query and cache the results on the children objects. list(children) # Will reuse the cached results. children.order_by('saved_dt') # Will perform a new database query even if children has cached results.
If we performed the operation in memory this could lead to catastrophic performance given a large enough result set. This would be backward incompatible and kind of a footgun API.
I think the Queryset
documentation makes it clear that calling a method will create and return a new instance with no cached results.
comment:6 by , 9 years ago
Description: | modified (diff) |
---|
@charettes I actually think that the example you provided indicates what I would consider a deficiency of the ORM: coalescing multiple calls to order_by
with the same argument (I have modified the original queryset in your example):
>>> children = Child.objects.order_by('saved_dt').all() >>> list(children) [<Child: Child object>, <Child: Child object>] (0.001) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC; args=() >>> list(children) [<Child: Child object>, <Child: Child object>] >>> children.order_by('saved_dt') [<Child: Child object>, <Child: Child object>] (0.000) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=()
I would hope that the second call to children.order_by('saved_dt')
can return self since the queryset is already sorted by the desired key.
comment:7 by , 9 years ago
Description: | modified (diff) |
---|
comment:8 by , 9 years ago
Here is one more example of what I would consider a bug and it does not involve multiple calls to order_by
:
>>> Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).first().child_set.all().ordered False
If I remove the
order_by
on them.child_set
, then I get the expected two queries: