Opened 8 years ago

Closed 19 months ago

Last modified 19 months ago

#26780 closed New feature (fixed)

Prefetch objects don't work with slices

Reported by: Ludwik Trammer Owned by: Simon Charette
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: prefetch, slicing
Cc: TZanke, Dan LaManna, chex, Pavel Lysak, Arthur, şuayip üzülmez, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Prefetch() objects does not work with sliced querysets. For example the following code results in AssertionError: Cannot filter a query once a slice has been taken.:

Category.objects.prefetch_related(Prefetch(
    'post_set',
    queryset=Post.objects.all()[:3],
    to_attr='example_posts',
))

This behavior is also mentioned in this StackOverflow answer. On the other hand it does not seem to be documented in Django Docs.

Why is it needed?
My use case seems to be a common one: I want to display a list of categories while displaying couple of example objects from each category next to it. If I'm not mistaken there isn't currently an efficient way of doing this. Prefetching without slicing would prefetch all objects (and there may be thousands of them) instead of the three examples that are needed.

Change History (22)

comment:1 by Ludwik Trammer, 8 years ago

Summary: Prefetch objects doesn't work with slicesPrefetch objects don't work with slices

comment:2 by Ludwik Trammer, 8 years ago

It seems to me that #26565 ("Allow Prefetch query to use .values()") is related.

comment:3 by Simon Charette, 8 years ago

Hi ludwik,

#26565 is about a different issue.

The only way we could support slices would be to extract it from the provided queryset to make it filterable again and apply it when in-memory joining. The prefect queryset would still have to fetch all results so I'm not sure it's worth the trouble as accessing category.example_posts[:3] would have the same effect.

Last edited 8 years ago by Simon Charette (previous) (diff)

comment:4 by Ludwik Trammer, 8 years ago

Charettes,

Thank you for the reply.

accessing category.example_posts[:3] would have the same effect

I'm pretty sure that is not correct. The end effect would obviously be the same, but the way it is achieved behind the scenes and the performance characteristics that follows would be widely different. What you propose (prefetching without limit, adding slicing in a loop after the fact) would make Django perform a database query selecting all objects from the database table and loading them into memory. This would happen when the main queryset is evaluated (that's what prefetching is all about). Then the slicing would be performed by Python in memory, on a queryset that was already evaluated.

That's what I understood from the documentation and also how Django actually behaved in an experiment I performed couple of minutes ago. What I want to avoid is exactly this behavior - loading thousands of objects from the database to display first three of them.

I would be happy with a sane workaround, but to my knowledge there isn't any.

Last edited 8 years ago by Ludwik Trammer (previous) (diff)

comment:5 by Simon Charette, 8 years ago

Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted
Type: BugNew feature
Version: 1.9master

Tentatively accepting but as you've noticed it would require a large refactor of the way prefetching is actually done.

I could see Category.objects.prefetch_related(Prefetch('articles', Article.object.order_by('-published_data')[0:3], to_attr='latest_articles')) being useful but the underlying prefetch query would need to rely on subqueries, a feature the ORM is not good at for now.

Maybe the addition of the Subquery expression could help here.

comment:6 by Todor Velichkov, 8 years ago

The only way to limit this Query (and to not load all the records into memory) would be to do a top-n-per-group query which I don't think the ORM is capable of.

What I can suggest is: Instead of limiting the QuerySet, find a way to filter it.

For example, get only the Posts in the last day, week or a month (based on the post-frequency).

Category.objects.prefetch_related(Prefetch(
    'post_set',
    queryset=Post.objects.filter(
        date_published__gte=datetime.date.today()-timedelta(days=7)),
    to_attr='example_posts',
))

This way you won't load all the Posts into Memory.

Last edited 8 years ago by Todor Velichkov (previous) (diff)

comment:7 by skyjur, 8 years ago

If window queries are implemented (https://code.djangoproject.com/ticket/26608) then this use case could be possible in something similar to:

Prefetch(
    'post_set',
    queryset=
        Post.objects
        .annotate(_rank=Window(Rank(), partition_by='cateogry')
        .filter(_rank__lte=3)
)

comment:8 by Jeff Johnson, 6 years ago

I tried the filter by rank suggestion using Django 2.0 but I get this error:

django.db.utils.NotSupportedError: Window is disallowed in the filter clause.

comment:9 by Haseeb Ahmad, 6 years ago

I think there is a workaround now to in django new version as we have OuterRef and Subquery.

from django.db.models import OuterRef, Subquery

User.objects.all().prefetch_related('comments',queryset=Comment.objects.filter(id__in=Subquery(Comment.objects.filter(user_id=OuterRef('user_id')).values_list('id', flat=True)[:5])))

Last edited 6 years ago by Tim Graham (previous) (diff)

comment:10 by TZanke, 5 years ago

Cc: TZanke added

comment:11 by Dan LaManna, 5 years ago

Cc: Dan LaManna added

comment:12 by chex, 5 years ago

Cc: chex added

comment:13 by Pavel Lysak, 5 years ago

Cc: Pavel Lysak added

comment:14 by Arthur, 2 years ago

Cc: Arthur added

comment:15 by Arthur, 2 years ago

Note that the solution above might not be portable to some MySql databases, with error

1235, "This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'"

comment:16 by şuayip üzülmez, 20 months ago

Cc: şuayip üzülmez added

comment:17 by Simon Charette, 20 months ago

As pointed out by others support for filter against window functions would allow prefetch_related_objects to use Rank(partition_by) to support this feature.

If someone wants to give a shot at solving this particular issue before #28333 is resolved it should be doable by using a combination of Queryset.raw and Query.compile combinations.

comment:18 by Simon Charette, 20 months ago

Cc: Simon Charette added

comment:19 by Simon Charette, 19 months ago

Has patch: set
Owner: changed from nobody to Simon Charette
Status: newassigned

Submitted a patch that works on top of the currently under-review patch for #28333 and it was surprisingly very straightforward.

Last edited 19 months ago by Simon Charette (previous) (diff)

comment:20 by Mariusz Felisiak, 19 months ago

Triage Stage: AcceptedReady for checkin

comment:21 by Mariusz Felisiak <felisiak.mariusz@…>, 19 months ago

Resolution: fixed
Status: assignedclosed

In 242499f2:

Fixed #26780 -- Added prefetch_related() support for sliced queries.

This was made possible by window function filtering support added in
f387d024fc75569d2a4a338bfda76cc2f328f627.

comment:22 by GitHub <noreply@…>, 19 months ago

In 69fa2e8e:

Refs #26780 -- Made prefetch_related() don't use window expressions fo sliced queries if not supported.

Note: See TracTickets for help on using tickets.
Back to Top