Opened 7 years ago

Closed 10 months ago

Last modified 9 months ago

#26780 closed New feature (fixed)

Prefetch objects don't work with slices

Reported by: Ludwik Trammer Owned by: Simon Charette
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: prefetch, slicing
Cc: TZanke, Dan LaManna, chex, Pavel Lysak, Arthur, şuayip üzülmez, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Prefetch() objects does not work with sliced querysets. For example the following code results in AssertionError: Cannot filter a query once a slice has been taken.:

Category.objects.prefetch_related(Prefetch(
    'post_set',
    queryset=Post.objects.all()[:3],
    to_attr='example_posts',
))

This behavior is also mentioned in this StackOverflow answer. On the other hand it does not seem to be documented in Django Docs.

Why is it needed?
My use case seems to be a common one: I want to display a list of categories while displaying couple of example objects from each category next to it. If I'm not mistaken there isn't currently an efficient way of doing this. Prefetching without slicing would prefetch all objects (and there may be thousands of them) instead of the three examples that are needed.

Change History (22)

comment:1 Changed 7 years ago by Ludwik Trammer

Summary: Prefetch objects doesn't work with slicesPrefetch objects don't work with slices

comment:2 Changed 7 years ago by Ludwik Trammer

It seems to me that #26565 ("Allow Prefetch query to use .values()") is related.

comment:3 Changed 7 years ago by Simon Charette

Hi ludwik,

#26565 is about a different issue.

The only way we could support slices would be to extract it from the provided queryset to make it filterable again and apply it when in-memory joining. The prefect queryset would still have to fetch all results so I'm not sure it's worth the trouble as accessing category.example_posts[:3] would have the same effect.

Last edited 7 years ago by Simon Charette (previous) (diff)

comment:4 Changed 7 years ago by Ludwik Trammer

Charettes,

Thank you for the reply.

accessing category.example_posts[:3] would have the same effect

I'm pretty sure that is not correct. The end effect would obviously be the same, but the way it is achieved behind the scenes and the performance characteristics that follows would be widely different. What you propose (prefetching without limit, adding slicing in a loop after the fact) would make Django perform a database query selecting all objects from the database table and loading them into memory. This would happen when the main queryset is evaluated (that's what prefetching is all about). Then the slicing would be performed by Python in memory, on a queryset that was already evaluated.

That's what I understood from the documentation and also how Django actually behaved in an experiment I performed couple of minutes ago. What I want to avoid is exactly this behavior - loading thousands of objects from the database to display first three of them.

I would be happy with a sane workaround, but to my knowledge there isn't any.

Last edited 7 years ago by Ludwik Trammer (previous) (diff)

comment:5 Changed 7 years ago by Simon Charette

Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted
Type: BugNew feature
Version: 1.9master

Tentatively accepting but as you've noticed it would require a large refactor of the way prefetching is actually done.

I could see Category.objects.prefetch_related(Prefetch('articles', Article.object.order_by('-published_data')[0:3], to_attr='latest_articles')) being useful but the underlying prefetch query would need to rely on subqueries, a feature the ORM is not good at for now.

Maybe the addition of the Subquery expression could help here.

comment:6 Changed 7 years ago by Todor Velichkov

The only way to limit this Query (and to not load all the records into memory) would be to do a top-n-per-group query which I don't think the ORM is capable of.

What I can suggest is: Instead of limiting the QuerySet, find a way to filter it.

For example, get only the Posts in the last day, week or a month (based on the post-frequency).

Category.objects.prefetch_related(Prefetch(
    'post_set',
    queryset=Post.objects.filter(
        date_published__gte=datetime.date.today()-timedelta(days=7)),
    to_attr='example_posts',
))

This way you won't load all the Posts into Memory.

Last edited 7 years ago by Todor Velichkov (previous) (diff)

comment:7 Changed 7 years ago by skyjur

If window queries are implemented (https://code.djangoproject.com/ticket/26608) then this use case could be possible in something similar to:

Prefetch(
    'post_set',
    queryset=
        Post.objects
        .annotate(_rank=Window(Rank(), partition_by='cateogry')
        .filter(_rank__lte=3)
)

comment:8 Changed 5 years ago by Jeff Johnson

I tried the filter by rank suggestion using Django 2.0 but I get this error:

django.db.utils.NotSupportedError: Window is disallowed in the filter clause.

comment:9 Changed 5 years ago by Haseeb Ahmad

I think there is a workaround now to in django new version as we have OuterRef and Subquery.

from django.db.models import OuterRef, Subquery

User.objects.all().prefetch_related('comments',queryset=Comment.objects.filter(id__in=Subquery(Comment.objects.filter(user_id=OuterRef('user_id')).values_list('id', flat=True)[:5])))

Last edited 5 years ago by Tim Graham (previous) (diff)

comment:10 Changed 4 years ago by TZanke

Cc: TZanke added

comment:11 Changed 4 years ago by Dan LaManna

Cc: Dan LaManna added

comment:12 Changed 4 years ago by chex

Cc: chex added

comment:13 Changed 4 years ago by Pavel Lysak

Cc: Pavel Lysak added

comment:14 Changed 18 months ago by Arthur

Cc: Arthur added

comment:15 Changed 18 months ago by Arthur

Note that the solution above might not be portable to some MySql databases, with error

1235, "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'"

comment:16 Changed 11 months ago by şuayip üzülmez

Cc: şuayip üzülmez added

comment:17 Changed 10 months ago by Simon Charette

As pointed out by others support for filter against window functions would allow prefetch_related_objects to use Rank(partition_by) to support this feature.

If someone wants to give a shot at solving this particular issue before #28333 is resolved it should be doable by using a combination of Queryset.raw and Query.compile combinations.

comment:18 Changed 10 months ago by Simon Charette

Cc: Simon Charette added

comment:19 Changed 10 months ago by Simon Charette

Has patch: set
Owner: changed from nobody to Simon Charette
Status: newassigned

Submitted a patch that works on top of the currently under-review patch for #28333 and it was surprisingly very straightforward.

Last edited 10 months ago by Simon Charette (previous) (diff)

comment:20 Changed 10 months ago by Mariusz Felisiak

Triage Stage: AcceptedReady for checkin

comment:21 Changed 10 months ago by Mariusz Felisiak <felisiak.mariusz@…>

Resolution: fixed
Status: assignedclosed

In 242499f2:

Fixed #26780 -- Added prefetch_related() support for sliced queries.

This was made possible by window function filtering support added in
f387d024fc75569d2a4a338bfda76cc2f328f627.

comment:22 Changed 9 months ago by GitHub <noreply@…>

In 69fa2e8e:

Refs #26780 -- Made prefetch_related() don't use window expressions fo sliced queries if not supported.

Note: See TracTickets for help on using tickets.
Back to Top