Context Navigation

← Previous Ticket
Next Ticket →

#25464 new New feature

Allow skipping IN clause on prefetch queries

Reported by:	Erik Cederstrand	Owned by:	nobody
Component:	Database layer (models, ORM)	Version:	dev
Severity:	Normal	Keywords:
Cc:	Simon Charette, Shai Berger	Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	yes
Needs tests:	yes	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description (last modified by Shai Berger) ¶

When using prefetch_related() on a large queryset, the prefetch query SQL can be inefficient. Consider this:

    Category.objects.filter(type=5).prefetch_related('items')

If 100.000 categories have type=5, then an IN clause with 100.000 Category IDs is generated to get the Item objects. Even with a custom queryset using a Prefetch() object, the IN clause is generated, even though it is A) redundant, B) sends a potentially multi-megabyte SQL statement over the wire for the database to process, C) may confuse the query planner to generate an inefficient execution plan, and D) doesn't scale:

    Category.objects.filter(type=5).prefetch_related(Prefetch('items', queryset=Item.objects.filter(category__type=5)))

Pull request https://github.com/django/django/pull/5356 adds the possibility to skip the IN clause in cases where we are sure that a better queryset will get (at least) the same items as the IN clause would:

    Category.objects.filter(type=5).prefetch_related(Prefetch('items', queryset=Item.objects.filter(category__type=5), filter_on_instances=False))

In my tests, this speeds up prefetch_related() by 20x-50x on large querysets.

According to the ticket's flags, the next step(s) to move this issue forward are:

To add tests to the patch, then uncheck the "Needs tests" flag on the ticket.
To write documentation for the patch, then uncheck "Needs documentation" on the ticket.
If creating a new pull request, include a link to the pull request in the ticket comment when making that update. The usual format is: [https://github.com/django/django/pull/#### PR].

Change History (6)

comment:1 by Marc Tamlyn, 9 years ago

Needs documentation:	set
Needs tests:	set
Triage Stage:	Unreviewed → Accepted

Accepting the basic idea here. I'd be interested to see some performance profiles across a few databases here though, it seems your inner query there is potentially doing an enormous join to do it's filter, I'm slightly surprised this is that much more efficient.

Patch needs tests and documentation, and also needs to reassure me what happens if the prefetched queryset contains extraneous rows. Are they ignored? Does it throw an error? What should the result be?

comment:2 by Tim Graham, 9 years ago

Description:	modified (diff)

comment:3 by Erik Cederstrand, 9 years ago

Thanks. Which profiling would you like? I'm working on PostgreSQL and could provide EXPLAIN on both types of queries.

I'll try to add documentation some tests to /django/tests/prefetch_related/

When prefetching is done, the prefetched objects are matched with the instances by pk. My basic gut feeling is that extraneous rows should simply be ignored, for cases where it's simply more efficient to fetch a little more than we need.

comment:4 by Simon Charette, 9 years ago

Cc:	Simon Charette added

comment:5 by Shai Berger, 9 years ago

Cc:	Shai Berger added
Description:	modified (diff)

Two notes come to mind:

1) While this may be less "natural" to think of, it seems the query would be more natural and efficient as

    Item.objects.filter(category__type=5).select_related('category')

Of course, this would require restructuring the code that handles the items and categories.

2) Not sure if this is as easy, but I think a better and more general alternative would be to expose the "joining-in-python" mechanism for general use. I'm thinking along the lines of

    cats = Category.objects.filter(type=5)
    items = Item.Item.objects.filter(category__type=5)
    cats.use_prefetched('items', items)

where 'items' is the name of the reverse relation, of course, and items could be replaced with any iterable returning Item instances.

comment:6 by Adam Johnson, 9 years ago

1) Presumably the reason prefetch_related is being used is to avoid fetching the category data repeated many times, is select_related does by including it in the join. You can keep the prefetching with:

Item.objects.filter(category__type=5).prefetch_related('category')

2) See #25279 for my simple suggestion of making prefetch_related_objects a public API, which simply exposes the pre-existing prefetch code for "joining in python" mechanism. I don't know if it would actually solve this problem though.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#25464 new New feature

Allow skipping IN clause on prefetch queries

Description (last modified by Shai Berger) ¶

Change History (6)

comment:1 by Marc Tamlyn, 9 years ago

comment:2 by Tim Graham, 9 years ago

comment:3 by Erik Cederstrand, 9 years ago

comment:4 by Simon Charette, 9 years ago

comment:5 by Shai Berger, 9 years ago

comment:6 by Adam Johnson, 9 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us