Opened 3 years ago

Last modified 12 months ago

#26780 new New feature

Prefetch objects don't work with slices

Reported by: Ludwik Trammer Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: prefetch, slicing
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Prefetch() objects does not work with sliced querysets. For example the following code results in AssertionError: Cannot filter a query once a slice has been taken.:

Category.objects.prefetch_related(Prefetch(
    'post_set',
    queryset=Post.objects.all()[:3],
    to_attr='example_posts',
))

This behavior is also mentioned in this StackOverflow answer. On the other hand it does not seem to be documented in Django Docs.

Why is it needed?
My use case seems to be a common one: I want to display a list of categories while displaying couple of example objects from each category next to it. If I'm not mistaken there isn't currently an efficient way of doing this. Prefetching without slicing would prefetch all objects (and there may be thousands of them) instead of the three examples that are needed.

Change History (9)

comment:1 Changed 3 years ago by Ludwik Trammer

Summary: Prefetch objects doesn't work with slicesPrefetch objects don't work with slices

comment:2 Changed 3 years ago by Ludwik Trammer

It seems to me that #26565 ("Allow Prefetch query to use .values()") is related.

comment:3 Changed 3 years ago by Simon Charette

Hi ludwik,

#26565 is about a different issue.

The only way we could support slices would be to extract it from the provided queryset to make it filterable again and apply it when in-memory joining. The prefect queryset would still have to fetch all results so I'm not sure it's worth the trouble as accessing category.example_posts[:3] would have the same effect.

Last edited 3 years ago by Simon Charette (previous) (diff)

comment:4 Changed 3 years ago by Ludwik Trammer

Charettes,

Thank you for the reply.

accessing category.example_posts[:3] would have the same effect

I'm pretty sure that is not correct. The end effect would obviously be the same, but the way it is achieved behind the scenes and the performance characteristics that follows would be widely different. What you propose (prefetching without limit, adding slicing in a loop after the fact) would make Django perform a database query selecting all objects from the database table and loading them into memory. This would happen when the main queryset is evaluated (that's what prefetching is all about). Then the slicing would be performed by Python in memory, on a queryset that was already evaluated.

That's what I understood from the documentation and also how Django actually behaved in an experiment I performed couple of minutes ago. What I want to avoid is exactly this behavior - loading thousands of objects from the database to display first three of them.

I would be happy with a sane workaround, but to my knowledge there isn't any.

Last edited 3 years ago by Ludwik Trammer (previous) (diff)

comment:5 Changed 3 years ago by Simon Charette

Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted
Type: BugNew feature
Version: 1.9master

Tentatively accepting but as you've noticed it would require a large refactor of the way prefetching is actually done.

I could see Category.objects.prefetch_related(Prefetch('articles', Article.object.order_by('-published_data')[0:3], to_attr='latest_articles')) being useful but the underlying prefetch query would need to rely on subqueries, a feature the ORM is not good at for now.

Maybe the addition of the Subquery expression could help here.

comment:6 Changed 3 years ago by Todor Velichkov

The only way to limit this Query (and to not load all the records into memory) would be to do a top-n-per-group query which I don't think the ORM is capable of.

What I can suggest is: Instead of limiting the QuerySet, find a way to filter it.

For example, get only the Posts in the last day, week or a month (based on the post-frequency).

Category.objects.prefetch_related(Prefetch(
    'post_set',
    queryset=Post.objects.filter(
        date_published__gte=datetime.date.today()-timedelta(days=7)),
    to_attr='example_posts',
))

This way you won't load all the Posts into Memory.

Last edited 3 years ago by Todor Velichkov (previous) (diff)

comment:7 Changed 3 years ago by skyjur

If window queries are implemented (https://code.djangoproject.com/ticket/26608) then this use case could be possible in something similar to:

Prefetch(
    'post_set',
    queryset=
        Post.objects
        .annotate(_rank=Window(Rank(), partition_by='cateogry')
        .filter(_rank__lte=3)
)

comment:8 Changed 17 months ago by Jeff Johnson

I tried the filter by rank suggestion using Django 2.0 but I get this error:

django.db.utils.NotSupportedError: Window is disallowed in the filter clause.

comment:9 Changed 12 months ago by Haseeb Ahmad

I think there is a workaround now to in django new version as we have OuterRef and Subquery.

from django.db.models import OuterRef, Subquery

User.objects.all().prefetch_related('comments',queryset=Comment.objects.filter(id__in=Subquery(Comment.objects.filter(user_id=OuterRef('user_id')).values_list('id', flat=True)[:5])))

Last edited 12 months ago by Tim Graham (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top