Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#26211 closed New feature (wontfix)

prefetch_related with Prefetch with queryset with explicit Ordering Ignored — at Version 7

Reported by: Alex Rothberg Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Alex Rothberg)

Given these models:

class Parent(models.Model):
    pass

class Child(models.Model):
    saved_dt = models.DateTimeField(auto_now_add=True)
    parent = models.ForeignKey(Parent)

I am executing this query (for each Parent I want to get the child_set in sorted order):

[m.child_set.order_by('saved_dt') for m in Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).all()]

I would expect that to be two SQL queries, but in reality I see N+2 where N is the number of Childs:

SELECT "prefetch_parent"."id" FROM "prefetch_parent"; args=()
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" IN (1, 2) ORDER BY "prefetch_child"."saved_dt" ASC; args=(1, 2)
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 1 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(1,)
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" = 2 ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=(2,)

Change History (7)

comment:1 by Alex Rothberg, 9 years ago

If I remove the order_by on the m.child_set, then I get the expected two queries:

[m.child_set.all()  for m in Parent.objects.prefetch_related(Prefetch('child_set', Child.objects.order_by('saved_dt'))).all()]

SELECT "prefetch_parent"."id" FROM "prefetch_parent"; args=()
SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" WHERE "prefetch_child"."parent_id" IN (1, 2) ORDER BY "prefetch_child"."saved_dt" ASC; args=(1, 2)

comment:2 by Tim Graham, 9 years ago

Good, that's what I was going suggest. I think we can close this as expected behavior.

comment:3 by Alex Rothberg, 9 years ago

Woah! I was not suggesting closing this! I don't think behavior is as expected. The ORM should realize the prefetch has already been done.

The ORM should perform the partition operation in memory rather than going back to the DB. In fact I assume it already does that. For some reason the request for sorting seems to break the use of the cached query results.

Last edited 9 years ago by Alex Rothberg (previous) (diff)

comment:4 by Tim Graham, 9 years ago

Component: UncategorizedDatabase layer (models, ORM)
Type: UncategorizedNew feature

I disagree -- enabling QuerySet.order_by() to sometimes do operations in memory could be quite confusing. Also, it's impossible to replicate the databases implementation of ordering in Python. As the documentation notes, "Each order_by() call will clear any previous ordering" (it returns a new QuerySet). Anyway, this isn't my area of expertise so I'll leave it open for a second opinion.

Last edited 9 years ago by Tim Graham (previous) (diff)

comment:5 by Simon Charette, 9 years ago

Resolution: wontfix
Status: newclosed

I'm also inclined to close this ticket as expected behavior.

The prefetch_related method stores the prefetched objects on the instance it belongs to. As soon as you call a queryset method on the related manager holding the cached instances (child_set.order_by()) a new queryset instance is returned which doesn't hold any reference to the previously cached results.

This is same as expecting a call to a queryset method with cached results to perform its operation in memory instead of going back to the db.

e.g.

children = Child.objects.all()
list(children)  # Will issue a query and cache the results on the children objects.
list(children)  # Will reuse the cached results.
children.order_by('saved_dt')  # Will perform a new database query even if children has cached results.

If we performed the operation in memory this could lead to catastrophic performance given a large enough result set. This would be backward incompatible and kind of a footgun API.

I think the Queryset documentation makes it clear that calling a method will create and return a new instance with no cached results.

comment:6 by Alex Rothberg, 9 years ago

Description: modified (diff)

@charettes I actually think that the example you provided indicates what I would consider a deficiency of the ORM: coalescing multiple calls to order_by with the same argument (I have modified the original queryset in your example):

>>> children = Child.objects.order_by('saved_dt').all()
>>> list(children)
[<Child: Child object>, <Child: Child object>]
(0.001) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC; args=()
>>> list(children)
[<Child: Child object>, <Child: Child object>]
>>> children.order_by('saved_dt') 
[<Child: Child object>, <Child: Child object>]
(0.000) SELECT "prefetch_child"."id", "prefetch_child"."saved_dt", "prefetch_child"."parent_id" FROM "prefetch_child" ORDER BY "prefetch_child"."saved_dt" ASC LIMIT 21; args=()

I would hope that the second call to children.order_by('saved_dt') can return self since the queryset is already sorted by the desired key.

Last edited 9 years ago by Alex Rothberg (previous) (diff)

comment:7 by Alex Rothberg, 9 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top