Code

Opened 3 years ago

Closed 5 months ago

Last modified 7 weeks ago

#17001 closed New feature (fixed)

Allow usage of custom querysets in prefetch_related

Reported by: akaariai Owned by: loic84
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: prefetch_related
Cc: marc.tamlyn@…, AndrewIngram, k.sikora@…, ethan.jucovy@…, as@…, mszamot@…, jonatan.lundin@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

#16973 introduced prefetch_related. I would like to extend it so that one can use custom querysets for the fetching. This also requires the ability to define the attribute name to use for the prefetched objects, as it is not valid to return filtered objects from related_manager.all(). There is also the possibility that one wants to fetch multiple objects of the same type with prefetch_related (young authors to one attribute, old authors to another).

The proposed API is following:

# The normal usage does not change
SomeModel.objects.prefetch_related('rel_model', 'rel_model__another_rel')
# You can pass R objects to prefetch_related
# The syntax for R is
# R(lookup_path, to_attr=None, qs=None)
# Thus:
SomeModel.objects.prefetch_related(R('rel_model'), R('rel_model__another_rel'))
# is equivalent to the first call.
# One can use to_attr alone:
SomeModel.objects.prefetch_related(R('rel_model', to_attr='foo_lst'), ...)
# custom queryset
SomeModel.objects.prefetch_related(
    R('rel_model', to_attr='foo_lst', qs=RelModel.objects.order_by('foo')),
    ...
)
# If qs is defined, to_attr must be defined also:
SomeModel.objects.prefetch_related(R('rel_model', qs=RelModel.objects.order_by('-pk')))
-> Error: Thou shalt not use qs without to_attr
# One can use the to_attr in nested calls:
qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)
# This will result in every object O in foo_lst to have attribute anot_lst containing
# related objects to O, fetched from the above given qs.
# Going still deeper would need to use syntax:
qs = qs.prefetch_related(
   R('foo_lst__anot_lst__third_model', qs=ThirdModel.objects.order_by('-pk'), to_attr='bar_lst'),
)

Attachments (2)

prefetch_extensions.diff (35.3 KB) - added by akaariai 3 years ago.
Now with settings.DEBUG removed
#17001-prefetch_extensions.diff (25.5 KB) - added by Kronuz 11 months ago.
Fixed issue with multiple level prefetch

Download all attachments as: .zip

Change History (49)

comment:1 Changed 3 years ago by akaariai

  • Has patch set
  • Needs documentation set
  • Needs tests unset
  • Patch needs improvement unset

Ok, I think I have a ready patch for this. There are some TODO comments still there (a part of them are personal TODOs, some are real TODOs). And this still needs documentation.

About the documentation, I don't know if we should mention that it is possible to chain prefetches in this manner:

qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)

The problem is that it is really, really, easy to do this instead:

qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('ref_model__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)

Note that the lookup path in the second R-object isn't referring to foo_lst, but instead ref_model, so we will fetch the ref_models another time. I would even go so far as to explicitly prohibit the foo_lst__another_model usage, as it is confusing as hell.

Chaining prefetches can be done by using custom querysets which have prefetches. The only problem is that if you get a queryset which already has prefetches, you can't chain to the existing ones. One idea is to have a 'name' and 'chain_to' kwargs for R objects, and then you could chain by the name.

My suggestion is that lets leave this for 1.5. I would explicitly prohibit the foo_lst__another_model way of chaining. It can be resurrected if need be. Then there is still the usual naming issues.

Changed 3 years ago by akaariai

Now with settings.DEBUG removed

comment:2 Changed 2 years ago by lukeplant

  • Triage Stage changed from Unreviewed to Accepted

@akaariai:

Accepted on the basis that the feature in general would be nice.

However, I haven't looked at the code at all. I doubt that I will have time to look at this before 1.4 is released. Part of me would actually like to leave the current version of prefetch_related() to get some testing in the wild before we build on top of it.

comment:3 Changed 22 months ago by Kronuz

I added an update for django 1.4 (tests still need to be upgraded) ...however, the question I still have is, the added attribute (to_attr) is a list, and I suppose it should be/act as a QuerySet instead, in my most humble opinion.

Version 0, edited 22 months ago by Kronuz (next)

comment:4 Changed 19 months ago by iu_long

What is the reasoning behind this in the R object's init ?

if qs is not None and not to_attr:
      raise ValueError('When custom qs is defined, to_attr '
              'must also be defined')

In my use case, I really need the custom prefetch to go on the same attribute and this prevents me from doing so.
Does not having a 'to_attr' with queryset break something or it is just for safety ( preventing accidents )?

comment:5 Changed 19 months ago by akaariai

If I recall correctly it is there so you can't shadow an existing queryset. This patch will fetch the results as a list, not as a queryset when you use a custom qs. So, if you use custom qs to fetch authors for book, then book.authors.all() wouldn't work, the items would be in book.authors, and book.authors would be a list.

In general, we want this feature in Django, but the current API isn't something wanted in Django...

comment:6 Changed 15 months ago by mjtamlyn

  • Cc marc.tamlyn@… added
  • Needs tests set
  • Patch needs improvement set

I would love this feature!

comment:7 Changed 15 months ago by anonymous

Pretty neat stuff!

comment:8 Changed 14 months ago by ebertti@…

Great feature!!! I will love this!!!

Changed 11 months ago by Kronuz

Fixed issue with multiple level prefetch

comment:9 Changed 11 months ago by AndrewIngram

  • Cc AndrewIngram added

comment:10 Changed 10 months ago by naktinis@…

Would be great to have this! What is the status of this feature request currently?

comment:11 Changed 10 months ago by anonymous

It would be a life saver for me! What is the status of this feature?

comment:12 Changed 10 months ago by aaugustin

The current status of this feature is fully and accurately reflected in the description above.

There's a patch, but it needs improvements, tests, and documentation.

Since it's a fairly large and complex feature, gaining the support of a core developer will be critical to complete the patch and have it committed.

comment:13 Changed 9 months ago by elektrrrus

  • Cc k.sikora@… added

comment:14 Changed 9 months ago by ethan.jucovy@…

  • Cc ethan.jucovy@… added

comment:15 Changed 8 months ago by chokosabe@…

The status of this ticket highlights everything that is wrong with how Django is being developed currently. 22 months! and still no movement on this!!!

comment:16 Changed 8 months ago by aaugustin

The tone of your comment highlight everything that is wrong with the consumer approach to open-source :)

If this issue bothers you so much, why don't you put your energy into resolving it rather than angry comments?

Seriously — your comment makes me *less* likely to contribute to this issue; I don't feel like helping someone who shouts on volunteers.

comment:17 Changed 8 months ago by akaariai

Note that this ticket is totally dependant on finding a good API. The functionality was already implemented using R() objects, the problem is that the API wasn't usable enough for merge. So, find a nice API and it is very likely that this feature will be merged.

comment:19 follow-up: Changed 8 months ago by akaariai

Lets accept the proposed API. There weren't any objections, and at least two core developers (me + mjtamlyn) support this API. The proposed API feels more natural than the R() thing, and it should be powerful enough.

The accepted API is to add an overloaded form of .prefetch_related(). The current way works as always, that is .prefetch_related('somemodel', 'somemodel__someothermodel') will continue to work. The new alternative is .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age')).

Chaining to prefetched custom queryset isn't allowed, that is in the above example you can't do .prefetch_related('authors_by_age__books'). If you need books prefetched to authors_by_age, then doing .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age').prefetch_related('books')) should work.

The prefetch happens directly into a list. That is, obj.authors_by_age.filter(...) doesn't work. The obj.authors_by_age attribute is a list instance, containing the related objects. Not creating a QueyrSet for each obj is a big performance optimization (IIRC around 5x performance difference when fetching small amount of related objects in a large queryset). Fetching directly into a list should also make the implementation cleaner.

For the first implementation making sure fetching basic querysets works well should be enough. For example, if it turns out it will be hard to support prefetching values() querysets or generic relations, then throwing an error and documenting what doesn't work seems good enough.

comment:20 in reply to: ↑ 19 Changed 8 months ago by mjtamlyn

Replying to akaariai:

For the first implementation making sure fetching basic querysets works well should be enough. For example, if it turns out it will be hard to support prefetching values() querysets or generic relations, then throwing an error and documenting what doesn't work seems good enough.

Agreed. If it's trickier than just getting this patch in needs to be we can always delay it for a future patch.

comment:21 Changed 8 months ago by alexpirine

I am interested in this feature, so please don't take the following as a criticism, I am just trying to add a feedback from a Django user perspective.

I am trying to understand, what is wrong with the initial implementation using the R objects?

We already have “F” and “Q” objects for “advanced” queries, so why not “R” objects for some other “advanced” stuff?

The code looks less clear when you begin to mix object names and various parameters in prefetch_related(), and .prefetch_related('authors', to_attr='authors_by_age', queryset=Author.objects.order_by('age').prefetch_related('books')) seems really weird and inconsistent with the current possibility of prefetch_related('authors_by_age__books').

Just to take the original example for comparison:

qs = SomeModel.objects.prefetch_related(
   R('rel_model', qs=RelModel.objects.order_by('-pk'), to_attr='foo_lst'),
   R('foo_lst__another_model', qs=AnotModel.objects.order_by('-pk'), to_attr='anot_lst')
)

How does it translates in the new API since chaining is not allowed?

comment:22 follow-up: Changed 7 months ago by FunkyBob

This follows some discussion on IRC with loic and akaariai...

Perhaps a more keywordy syntax?

To express that in the comment above:

qs = SomeModel.objects.prefetch_related(
    foo_list=Prefetch('rel_model', RelModel.objects.order_by('-pk')),
    anot_lst=Prefetch('foo_lst__another_model', AnotModel.objects.order_by('-pk')),
)

This also means you can avoid the repeated QS-copy cost for adding multiple prefetches.

And, as loic pointed out, you can get the "terse" syntax simply using:

from .... import Prefetch as R

comment:23 in reply to: ↑ 22 Changed 7 months ago by alexpirine

  • Cc as@… added

Replying to FunkyBob:

To express that in the comment above:

qs = SomeModel.objects.prefetch_related(
    foo_list=Prefetch('rel_model', RelModel.objects.order_by('-pk')),
    anot_lst=Prefetch('foo_lst__another_model', AnotModel.objects.order_by('-pk')),
)

This syntax seems very nice, I like how it is even more concise without to_attr.

comment:24 Changed 7 months ago by akaariai

I also like the kwargs based syntax. Maybe we should change the API to that?

One thing to consider is that kwargs aren't ordered. So, the API must not assume ordering there.

comment:25 Changed 7 months ago by chokosabe

I am more than happy to help with testing on this (or anything else that might need doing). There is no shortage of ideas on this; some(one/people) that can make a decision on this just need to do so....

Hate to point this out; can't imagine this (ongoing) discussion happening with a rails feature.... something would have been added already (it may well not be perfect or even right)...

comment:26 Changed 7 months ago by akaariai

I have discussed this issue with multiple people on #django-dev. It seems kwargs based API is favoured. Lets do that.

The syntax is:

qs = SomeModel.objects.prefetch_related(
    rel_pk_ordered=Prefetch('rel_model', RelModel.objects.order_by('-pk')),
)

This means that the objects fetched will have rel_pk_ordered each, containing objects through relation 'rel_model' using given queryset. It is not possible to chain prefetches to custom prefetches, instead one should do:

qs = SomeModel.objects.prefetch_related(
    rel_pk_ordered=Prefetch('rel_model', RelModel.objects.order_by('-pk').prefetch_related(...)),
)

that is, issue the prefetch in the given custom queryset.

comment:27 Changed 7 months ago by mjtamlyn

+1 to that syntax, seems a good middle ground.

comment:28 Changed 7 months ago by ryanhiebert

  • Cc ryanhiebert added

comment:29 Changed 7 months ago by ryanhiebert

  • Cc ryanhiebert removed

comment:30 Changed 7 months ago by mszamot@…

  • Cc mszamot@… added

comment:31 Changed 6 months ago by jonatan.lundin@…

  • Cc jonatan.lundin@… added

comment:32 Changed 6 months ago by loic84

  • Owner changed from nobody to loic84
  • Status changed from new to assigned
  • Version changed from 1.3 to master

comment:33 Changed 6 months ago by loic84

WIP: https://github.com/loic/django/compare/prefetch_related.

It's rather experimental and still needs tons of work with tests and docs, but if anyone want to give it a try, that would help.

Prefetch takes two parameters: lookup and an optional queryset.

  1. Manager backed lookups with *args:
    # Traditional string lookup:
    Restaurant.objects.prefetch_related('pizzas__toppings')

    # Equivalent with Prefetch:
    Restaurant.objects.prefetch_related(Prefetch('pizzas__toppings'))

    # Lookup with custom queryset:
    Restaurant.objects.prefetch_related(
        Prefetch('pizzas__toppings', queryset=Toppings.object.filter(veggie=True)))
  1. list backed lookups with **kwargs:
    # Traditional lookup stored in a list (significantly faster, see #20577 and link below)
    Restaurant.objects.prefetch_related(pizza_list=Prefetch('pizzas'))

    # With custom QuerySet.
    Pizza.objects.prefetch_related(
        toppings_list=Prefetch('toppings', queryset=Toppings.object.filter(veggie=True))

    # Multiple level using nested prefetch_related:
    pizza_queryset = Pizza.objects.prefetch_related(
        Prefetch('toppings', queryset=Toppings.object.filter(veggie=True)))
    Restaurant.objects.prefetch_related(
        pizza_list=Prefetch('pizzas', queryset=pizza_queryset))

    # Alternatively, multiple level using kwarg lookups (nested prefetch_related actually rewrite to that):
    Restaurant.objects.prefetch_related(
        pizza_list=Prefetch('pizzas'),
        pizza_list__toppings_list=Prefetch('pizza_list__toppings', queryset=Toppings.object.filter(veggie=True))
  1. Mixing both approaches:
    Restaurant.objects.prefetch_related(
        'pizza_list__toppings_list',
         pizza_list=Prefetch('pizza'), 
    )

  1. Flattening is not supported, it doesn't seem trivial to implement, maybe for another ticket?
Restaurant.objects.prefetch_related(
    toppings_list=Prefetch('pizzas__toppings')
)

Additionally, here is a quick and dirty benchmark that compares lookups backed by list vs Manager: https://gist.github.com/loic/7198193.

Last edited 6 months ago by loic84 (previous) (diff)

comment:34 Changed 6 months ago by loic84

As @akaariai pointed out in #comment:24, kwargs are not ordered; yet ordering matters a fair bit.

For example my example from the previous comment:

    Restaurant.objects.prefetch_related(
        'pizza_list__toppings_list',
         pizza_list=Prefetch('pizza'), 
    )

This wouldn't work without some sorting in the implementation.

I then found out that *args ordering is something we actually test for, even though it doesn't seem documented. This implicit ordering allows for @properties to masquerade as managers.

Sorting by lookup like in my patch, would obviously mess that up.

We could:

  1. Do the sorting based on the lookup path and add an after argument to the Prefetch object, so you could do prefetch_related(Prefetch('primary_house__occupants', after='houses__rooms'), 'houses__rooms'). This would actually be backward incompatible for whomever already rely on the implicit ordering of *args, since you need to opt-in for a custom after.
  1. Require multiple calls to prefetch_related() when ordering matters: prefetch_related(pizza_list=Prefetch('pizza')).prefetch_related('pizza_list__toppings_list'). IMO this adds yet another construct to an already complicated API.
  1. Go back to having to_attr as a Prefetch argument rather than a kwarg, and finally document that ordering of *args actually matters. This is backward compatible, and it also makes for a cleaner implementation. I was actually on the verge of adding it (undocumented), only for the purpose of cleaning up the current implementation.

For the record, I favor 3) by a big margin.

Please note that while prefetch_related('foo__bar__baz', 'foo__bar') and prefetch_related('foo__bar', 'foo__bar__baz') were equivalent, it's not necessarily true anymore now that we can do prefetch_related(Prefetch('foo__bar', queryset=...), 'foo__bar__baz'), so the actual ordering becomes a more prominent concern than before.

comment:35 Changed 6 months ago by charettes

In the light of @akaariai's comment concerning kwargs the only other alternative I could think of was to rely on a Prefetch instance creation counter to sort kwargs but as @loic84 pointed out in IRC it would be quite fragile.

I'd say I also favor the to_attr approach since it looks like the most sane way to avoid chaining prefetch_related to declare an ordering.

comment:36 Changed 6 months ago by mjtamlyn

I'm happy to have to_attr back as well, especially if it makes the code cleaner. This is a good smell.

comment:37 Changed 6 months ago by loic84

  • Needs tests unset

to_attr is back, cleaned up implementation and added a fair amount of tests. I'll tackle the docs ASAP.

Made a PR https://github.com/django/django/pull/1826 to ease with review.

comment:38 Changed 6 months ago by loic84

  • Needs documentation unset
  • Patch needs improvement unset

Updated the PR with docs, and even more tests.

This should be ready for deep review.

comment:39 Changed 5 months ago by loic84

@akaariai wrote a great summary of the status of this patch on the django-developers mailing list: https://groups.google.com/d/msg/django-developers/LhXpbpSOHlg/n8xKrpFcvyMJ.

Feedback would definitely help to move this ticket forward.

comment:40 Changed 5 months ago by loic84

Tackled the multidb issue described on the ML post and rebased to the latest master.

comment:41 Changed 5 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In f51c1f590085556abca44fd2a49618162203b2ec:

Fixed #17001 -- Custom querysets for prefetch_related.

This patch introduces the Prefetch object which allows customizing prefetch
operations.

This enables things like filtering prefetched relations, calling select_related
from a prefetched relation, or prefetching the same relation multiple times
with different querysets.

When a Prefetch instance specifies a to_attr argument, the result is stored
in a list rather than a QuerySet. This has the fortunate consequence of being
significantly faster. The preformance improvement is due to the fact that we
save the costly creation of a QuerySet instance.

Thanks @akaariai for the original patch and @bmispelon and @timgraham
for the reviews.

comment:42 Changed 5 months ago by abdulwahid24@…

Hi guys,

How would i integrate this patch in Django 1.4.5 as i need this in my existing project running on production.

Thanks,

comment:43 Changed 5 months ago by loic84

@abdulwahid24 it shouldn't be too hard to backport this patch manually, just note that the get_prefetch_queryset methods used to be called get_prefetch_query_set, other than that I don't think the code for prefetching has significantly changed since 1.4.

comment:44 Changed 5 months ago by abdulwahid

Hi loic84,

I was trying it since morning but i found it very hard to build backport manually, i tried to integrated all changes related to "get_prefetch_queryset" and i did changes as per your changes https://code.djangoproject.com/changeset/f51c1f590085556abca44fd2a49618162203b2ec in following files.
1.django/contrib/contenttypes/generic.py
2.django/db/models/init.py
3.django/db/models/query.py
4.django/db/models/fields/related.py

also add one more file as it require for LOOKUP_SEP
5.django/db/models/constants.py

It works fine with following query

Person.objects.filter(profileperson_typename='Speaker').prefetch_related(Prefetch('event_speaker', queryset=Event.objects.all())).

But when i run

Person.objects.filter(profileperson_typename='Speaker').prefetch_related(Prefetch('event_speaker',queryset=Event.objects.latest('start_time')))

gives error

AttributeError: 'Event' object has no attribute '_add_hints here 'Event' is my table name.

I stuck here and failed to resolve this.

Could you help me out to overcome this issue.

Note:

I just want to fire a query which return its related data in queryset only so that i can avoid for loop to access the related data as "[obj.relate_data for obj in objects]" to achieve optimization.

comment:45 Changed 5 months ago by loic84

@abdulwahid QuerySet.latest() doesn't return a QuerySet instance, so you can't pass it as the queryset kwarg of prefetch_related, maybe you meant order_by('-start_time') instead?

I'm afraid I can't help you much more than that here, trac tickets aren't meant as a support channel, you can find me on #django on IRC if you want.

comment:46 Changed 5 months ago by abdulwahid

Thanks for you help.

comment:47 Changed 7 weeks ago by Anssi Kääriäinen <akaariai@…>

In 7bbb6958dcb6a450f926e5bb49b07391f801aef1:

Allowed custom querysets when prefetching single valued relations

The original patch for custom prefetches didn't allow usage of custom
queryset for single valued relations (along ForeignKey or OneToOneKey).
Allowing these enables calling performance oriented queryset methods like
select_related or defer/only.

Thanks @akaariai and @timgraham for the reviews. Refs #17001.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.