#26211 closed New feature (wontfix)
prefetch_related with Prefetch with queryset with explicit Ordering Ignored
Reported by: | Alex Rothberg | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Given these models:
class Parent(models.Model): pass class Child(models.Model): saved_dt = models.DateTimeField(auto_now_add=True) parent = models.ForeignKey(Parent)
I am executing this query (for each Parent
I want to get the child_set
in sorted order):
[m.child_set.order_by('saved_dt') for m in Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).all()]
I would expect that to be two SQL queries, but in reality I see N+2
where N
is the number of Child
s:
SELECT "prefetch_parent"."id" FROM "prefetch_parent"; args=() SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" IN (1, 2) ORDER BY "prefetch_child"."saved_dt" ASC; args=(1, 2) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 1 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(1,) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 2 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(2,)
Change History (8)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Good, that's what I was going suggest. I think we can close this as expected behavior.
comment:3 by , 9 years ago
Woah! I was not suggesting closing this! I don't think behavior is as expected. The ORM should realize the prefetch has already been done.
comment:4 by , 9 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Type: | Uncategorized → New feature |
I disagree -- enabling QuerySet.order_by()
to sometimes do operations in memory could be quite confusing. Also, it's impossible to replicate the databases implementation of ordering in Python. As the documentation notes, "Each order_by()
call will clear any previous ordering" (it returns a new QuerySet
). Anyway, this isn't my area of expertise so I'll leave it open for a second opinion.
comment:5 by , 9 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I'm also inclined to close this ticket as expected behavior.
The prefetch_related
method stores the prefetched objects on the instance it belongs to. As soon as you call a queryset method on the related manager holding the cached instances (child_set.order_by()
) a new queryset instance is returned which doesn't hold any reference to the previously cached results.
This is same as expecting a call to a queryset method with cached results to perform its operation in memory instead of going back to the db.
e.g.
children = Child.objects.all() list(children) # Will issue a query and cache the results on the children objects. list(children) # Will reuse the cached results. children.order_by('saved_dt') # Will perform a new database query even if children has cached results.
If we performed the operation in memory this could lead to catastrophic performance given a large enough result set. This would be backward incompatible and kind of a footgun API.
I think the Queryset
documentation makes it clear that calling a method will create and return a new instance with no cached results.
comment:6 by , 9 years ago
Description: | modified (diff) |
---|
@charettes I actually think that the example you provided indicates what I would consider a deficiency of the ORM: coalescing multiple calls to order_by
with the same argument (I have modified the original queryset in your example):
>>> children = Child.objects.order_by('saved_dt').all() >>> list(children) [<Child: Child object>, <Child: Child object>] (0.001) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC; args=() >>> list(children) [<Child: Child object>, <Child: Child object>] >>> children.order_by('saved_dt') [<Child: Child object>, <Child: Child object>] (0.000) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=()
I would hope that the second call to children.order_by('saved_dt')
can return self since the queryset is already sorted by the desired key.
comment:7 by , 9 years ago
Description: | modified (diff) |
---|
comment:8 by , 9 years ago
Here is one more example of what I would consider a bug and it does not involve multiple calls to order_by
:
>>> Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).first().child_set.all().ordered False
If I remove the
order_by
on them.child_set
, then I get the expected two queries: