Context Navigation

← Previous Ticket
Next Ticket →

#28586 closed New feature (fixed)

Automatically prefetch related for "to one" fields as needed.

Reported by:	Gordon Wrigley	Owned by:	Adam Johnson
Component:	Database layer (models, ORM)	Version:	dev
Severity:	Normal	Keywords:	prefetch_related, fetch
Cc:	Adam Johnson, Ryan Hiebert, Ed Morley, Jonas Haag, şuayip üzülmez	Triage Stage:	Ready for checkin
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description (last modified by Gordon Wrigley)

When accessing a 2one field (foreign key in the forward direction and one2one in either direction) on a model instance, if the field's value has not yet been loaded then Django should prefetch the field for all model instances loaded by the same queryset as the current model instance.

There has been some discussion of this on the mailing list https://groups.google.com/forum/#!topic/django-developers/EplZGj-ejvg

Currently when accessing an uncached 2one field, Django will automatically fetch the missing value from the Database. When this occurs in a loop it creates 1+N query problems. Consider the following snippet:

for choice in Choice.objects.all():
    print(choice.question.question_text, ':', choice.choice_text)

This will do one query for the choices and then one query per choice to get that choice's question.
This behavior can be avoided with correct application of prefetch_related like this:

for choice in Choice.objects.prefetch_related('question'):
    print(choice.question.question_text, ':', choice.choice_text)

This has several usability issues, notably:

Less experienced users are generally not aware that it's necessary.
Cosmetic seeming changes to things like templates can change the fields that should be prefetched.
Related to that the code that requires the prefetch_related (template for example) may be quite removed from where the prefetch_related needs to be applied (view for example).
Subsequently finding where prefetch_related calls are missing is non trivial and needs to be done on an ongoing basis.
Excess fields in prefetch_related calls are even harder to find and result in unnecessary database queries.
It is very difficult for libraries like the admin and Django Rest Framework to automatically generate correct prefetch_related clauses.

The proposal is on the first iteration of the loop in the example above, when we first access a choice's question field, instead of fetching the question for just that choice, speculatively fetch the questions for all the choices returned by the queryset.
This change results in the first snippet having the same database behavior as the second while reducing or eliminating all of the noted usability issues.

Some important points:

2many fields are not changed at all by this proposal as I can't think of a reasonable way of deciding which of the many to fetch.
Because these are 2one fields the generated queries can't have more result rows than the original query and may have less. This eliminates any concern about a multiplicative query size explosion.
This feature will never result in more database queries as a prefetch will only be issued where the ORM was already going to fetch a related object.
Because it is triggered by fetching missing related objects it will not at all change the DB behavior of code which is full covered by prefetch_related (and select_related) calls.
This will inherently chain across relations like choice.question.author, the conditions above still hold under such chaining.
It may result in larger data transfer between the database and Django in some situations.

An example of that last point is:

qs = Choice.objects.all() 
list(qs)[0].question

Such examples generally seem to be rarer and more likely to be visible during code inspection (vs {{choice.question}} in a template). And larger queries are usually a better failure mode than producing hundreds of queries.
For this to actually produce inferior behavior in practice you need to:

fetch a large number of choices
filter out basically all of them
in a way that prevents garbage collection of the unfiltered ones

If any of those aren't true then automatic prefetching will still produce equivalent or better database behavior than without.

Several optin/optout options were discussed in the mailing list, I will attempt to summarize these below. Most of them are compatible with each other, however in the interests of having a clean interface we probably want to limit how many we implement.

A global option in settings. So as to not accidentally fix existing code this could default to disabled if not specified.
Per queryset either as auto_prefetch_related(value) or prefetch_related(auto=value) where value would determine enabled, disabled, default.
Per object, similar to the per queryset version.
Per model in meta, it's not clear if this was intended to be on
1. the model used in the original queryset
2. the model the field is on
3. the model the field refers to
As a context manager (this could then easily be applied in middleware or a view decorator)
On the field, similar to on_delete

P.S. I've been using this in my own code with no optin / optout for sometime and have had literally no problems with it.

Change History (28)

comment:1 by Gordon Wrigley, 9 years ago

I hope to have a first version of a pull for this up tomorrow

comment:2 by Adam Johnson, 9 years ago

Cc:	Adam Johnson added

comment:3 by Gordon Wrigley, 9 years ago

Since there was some discussion over optin / optout strategies I have for the moment gone with one that seems safe and easy to implement. So currently the feature is off by default and enabled by calling auto_prefetch_related() on a queryset.

Related to that I have not addressed documentation at all.

comment:4 by Gordon Wrigley, 9 years ago

For curiosity sake I tried running the test suite with auto_prefetch_related enabled by default. There were 3 test failures, two were looking for queries that are removed by auto_prefetch_related.
The third (SwappableModelTests.test_generated_data) attempts to fetch more rows than the sqlite backend can handle in a single 'in' clause, which I'd think is an issue with the 'in' implementation.
Looking at the test it is currently unintentionally doing some four and a quarter thousand DB queries. And attempting to fix it with an explicit prefetch fails in the exact same manner as the automatic prefetch.

Last edited 9 years ago by Gordon Wrigley (previous) (diff)

comment:5 by Ryan Hiebert, 9 years ago

Cc:	Ryan Hiebert added

comment:6 by Gordon Wrigley, 9 years ago

Description:	modified (diff)

comment:7 by Ed Morley, 9 years ago

Cc:	Ed Morley added

comment:8 by Jonas Haag, 9 years ago

Cc:	Jonas Haag added

comment:9 by Tim Graham, 9 years ago

Has patch:	set
Needs documentation:	set
Triage Stage:	Unreviewed → Accepted

comment:10 by Gordon Wrigley, 6 years ago

My existing code for this is now available as a pypi package
https://github.com/tolomea/django-auto-prefetch

comment:11 by Adam Johnson, 3 years ago

Has patch:	unset
Needs documentation:	unset
Owner:	changed from nobody to Adam Johnson
Status:	new → assigned

I’m working on a PR for Django core now, based on Andreas Pelme’s recently-closed PR and discussions with Andreas and Simon Charette.

comment:12 by şuayip üzülmez, 3 years ago

Cc:	şuayip üzülmez added

comment:13 by Jacob Walls, 17 months ago

Has patch:	set

Adam invites additional reviews on PR

comment:14 by Adam Johnson, 15 months ago

I split off a PR for GenericForeignKey which should be simpler to review.

comment:15 by Jacob Walls, 11 months ago

Needs documentation:	set
Patch needs improvement:	set

Main PR has some outstanding questions and merge conflicts. GFK PR needs a deprecation path, I think.

comment:16 by Jacob Walls, 11 months ago

Needs documentation:	unset

comment:17 by Jacob Walls <jacobtylerwalls@…>, 10 months ago

In 74a9c27:

Refs #28586 -- Split descriptor from GenericForeignKey.

This makes GenericForeignKey more similar to other fields which act as
descriptors, preparing it to add “fetcher protocol” support in a clear and
consistent way.

comment:18 by Jacob Walls, 9 months ago

Keywords:	fetch added
Patch needs improvement:	unset
Triage Stage:	Accepted → Ready for checkin

comment:19 by Jacob Walls, 9 months ago

Patch needs improvement:	set
Triage Stage:	Ready for checkin → Accepted

comment:20 by Gordon Wrigley, 9 months ago

Thank you Adam for picking this up and pushing it forward. It's nice to see it finally make it into the core where it can help more people.

FWIW in the 10 years since we did the very first version of this I've used it unconditionally in every Django project I've worked on and never once had an issue.
So I personally would definitely support making FETCH_PEERS the default behaviour in a future version.

Last edited 9 months ago by Gordon Wrigley (previous) (diff)

comment:21 by Jacob Walls, 9 months ago

Patch needs improvement:	unset
Triage Stage:	Accepted → Ready for checkin

comment:22 by Jacob Walls <jacobtylerwalls@…>, 9 months ago

In f6bd90c8:

Refs #28586 -- Edited related objects documentation.

This change aims to make this section clearer and ready to add a description of
fetch modes.

comment:23 by Jacob Walls <jacobtylerwalls@…>, 9 months ago

Resolution:	→ fixed
Status:	assigned → closed

In e097e8a:

Fixed #28586 -- Added model field fetch modes.

May your database queries be much reduced with minimal effort.

co-authored-by: Andreas Pelme <andreas@…>
co-authored-by: Simon Charette <charette.s@…>
co-authored-by: Jacob Walls <jacobtylerwalls@…>

comment:24 by Jacob Walls <jacobtylerwalls@…>, 9 months ago

In a321d96:

Refs #28586 -- Made fetch modes pickle as singletons.

This change ensures that we don’t create new instances of fetch modes
when pickling and unpickling, saving memory and preserving their singleton
nature.

comment:25 by Jacob Walls <jacobtylerwalls@…>, 9 months ago

In 821619aa:

Refs #28586 -- Simplified related descriptor get_queryset() methods.

Modify these methods to accept an instance parameter which is clearer and
allows us to set the instance hint earlier.

comment:26 by Jacob Walls <jacobtylerwalls@…>, 9 months ago

In 6dc9b04:

Refs #28586 -- Copied fetch modes to related objects.

This change ensures that behavior and performance remain consistent when
traversing relationships.

comment:27 by Jacob Walls <jacobtylerwalls@…>, 9 months ago

In e244d8bb:

Refs #28586 - Copied fetch mode in QuerySet.create().

This change allows the pattern MyModel.objects.fetch_mode(...).create(...) to
set the fetch mode for a new object.

comment:28 by Jacob Walls <jacobtylerwalls@…>, 3 months ago

In 5cf9c7fd:

Refs #28586 -- Added DEFAULT_FETCH_MODE module constant.

This is a more attractive target for alteration than all of QuerySet.init().

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues