Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#26211 closed New feature (wontfix)

prefetch_related with Prefetch with queryset with explicit Ordering Ignored

Reported by: Alex Rothberg Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Alex Rothberg)

Given these models:

class Parent(models.Model):
    pass

class Child(models.Model):
    saved_dt = models.DateTimeField(auto_now_add=True)
    parent = models.ForeignKey(Parent)

I am executing this query (for each Parent I want to get the child_set in sorted order):

[m.child_set.order_by('saved_dt') for m in Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).all()]

I would expect that to be two SQL queries, but in reality I see N+2 where N is the number of Childs:

SELECT "prefetch_parent"."id" FROM "prefetch_parent"; args=()
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" IN (1, 2) ORDER BY "prefetch_child"."saved_dt" ASC; args=(1, 2)
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 1 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(1,)
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 2 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(2,)

Change History (8)

comment:1 by Alex Rothberg, 8 years ago

If I remove the order_by on the m.child_set, then I get the expected two queries:

[m.child_set.all()  for m in Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).all()]

SELECT "prefetch_parent"."id" FROM "prefetch_parent"; args=()
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" IN (1, 2) ORDER BY "prefetch_child"."saved_dt" ASC; args=(1, 2)

comment:2 by Tim Graham, 8 years ago

Good, that's what I was going suggest. I think we can close this as expected behavior.

comment:3 by Alex Rothberg, 8 years ago

Woah! I was not suggesting closing this! I don't think behavior is as expected. The ORM should realize the prefetch has already been done.

Version 0, edited 8 years ago by Alex Rothberg (next)

comment:4 by Tim Graham, 8 years ago

Component: UncategorizedDatabase layer (models, ORM)
Type: UncategorizedNew feature

I disagree -- enabling QuerySet.order_by() to sometimes do operations in memory could be quite confusing. Also, it's impossible to replicate the databases implementation of ordering in Python. As the documentation notes, "Each order_by() call will clear any previous ordering" (it returns a new QuerySet). Anyway, this isn't my area of expertise so I'll leave it open for a second opinion.

Last edited 8 years ago by Tim Graham (previous) (diff)

comment:5 by Simon Charette, 8 years ago

Resolution: wontfix
Status: newclosed

I'm also inclined to close this ticket as expected behavior.

The prefetch_related method stores the prefetched objects on the instance it belongs to. As soon as you call a queryset method on the related manager holding the cached instances (child_set.order_by()) a new queryset instance is returned which doesn't hold any reference to the previously cached results.

This is same as expecting a call to a queryset method with cached results to perform its operation in memory instead of going back to the db.

e.g.

children = Child.objects.all()
list(children)  # Will issue a query and cache the results on the children objects.
list(children)  # Will reuse the cached results.
children.order_by('saved_dt')  # Will perform a new database query even if children has cached results.

If we performed the operation in memory this could lead to catastrophic performance given a large enough result set. This would be backward incompatible and kind of a footgun API.

I think the Queryset documentation makes it clear that calling a method will create and return a new instance with no cached results.

comment:6 by Alex Rothberg, 8 years ago

Description: modified (diff)

@charettes I actually think that the example you provided indicates what I would consider a deficiency of the ORM: coalescing multiple calls to order_by with the same argument (I have modified the original queryset in your example):

>>> children = Child.objects.order_by('saved_dt').all()
>>> list(children)
[<Child: Child object>, <Child: Child object>]
(0.001) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC; args=()
>>> list(children)
[<Child: Child object>, <Child: Child object>]
>>> children.order_by('saved_dt') 
[<Child: Child object>, <Child: Child object>]
(0.000) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=()

I would hope that the second call to children.order_by('saved_dt') can return self since the queryset is already sorted by the desired key.

Last edited 8 years ago by Alex Rothberg (previous) (diff)

comment:7 by Alex Rothberg, 8 years ago

Description: modified (diff)

comment:8 by Alex Rothberg, 8 years ago

Here is one more example of what I would consider a bug and it does not involve multiple calls to order_by:

>>> Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).first().child_set.all().ordered
False
Note: See TracTickets for help on using tickets.
Back to Top