#35376 closed New feature (duplicate)
Prefetched data not used when combining prefetch_related() and only()
Reported by: | Michael Schwarz | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | |
Cc: | Michael Schwarz, Simon Charette | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I think I found a simple case combining prefetch_related()
and only()
, where prefetched data isn't used when it should.
See the following model with a Restaurant
being a Building
, and a Building
being part of a City
. (replace Building
with Business
if it bugs you that a building can only contain a single restaurant 🙃, I noticed too late):
from django.db.models import DO_NOTHING from django.db.models import ForeignKey from django.db.models import Model from django.db.models import OneToOneField from django.db.models import TextField class City(Model): name = TextField() class Building(Model): city = ForeignKey(City, on_delete=DO_NOTHING) street_name = TextField() street_no = TextField() class Restaurant(Model): building = OneToOneField(Building, on_delete=DO_NOTHING) name = TextField()
I'm trying to build a query set of buildings with some of its attributes deferred using only()
, and the associated restaurants and cities being prefetched. In the following parametrized pytest
test case, only the last instance is exhibiting the issue, the other 3 seem to work as I would expect.
The test case creates an instance of each model, runs the query (list(...)
), and then accesses the restaurant
attribute, which should be prefetched in every case. The test case check that the access does not generate an additional query using django_assert_num_queries()
from pytest-django
.
import pytest from myproject.models import Building from myproject.models import Restaurant from myproject.models import City @pytest.mark.parametrize('qs', [ Building.objects.prefetch_related("restaurant", "city"), Building.objects.only("street_name").prefetch_related("restaurant"), Building.objects.only("street_name").prefetch_related("city", "restaurant"), Building.objects.only("street_name").prefetch_related("restaurant", "city") ]) def test_repro(db, django_assert_num_queries, qs): Restaurant.objects.create( name="", building=Building.objects.create( city=City.objects.create(name=""), street_name="", street_no="" ), ) result = list(qs) with django_assert_num_queries(0): result[0].restaurant
The first 3 instances of the test above succeed, the last one fails:
$ venv/bin/pytest test_repro.py [...] @pytest.mark.parametrize('qs', [ Building.objects.prefetch_related("restaurant", "city"), Building.objects.only("street_name").prefetch_related("restaurant"), Building.objects.only("street_name").prefetch_related("city", "restaurant"), Building.objects.only("street_name").prefetch_related("restaurant", "city") ]) def test_repro(db, django_assert_num_queries, qs): Restaurant.objects.create( name="", building=Building.objects.create( city=City.objects.create(name=""), street_name="", street_no="" ), ) result = list(qs) > with django_assert_num_queries(0): [...] E Failed: Expected to perform 0 queries but 1 was done (add -v option to show queries)
I've created a minimal project on GitHub documenting the exact setup, including all package versions.
AFAICT, in each case above, restaurant
should be prefetched, and the attribute access should not generate an additional access. Only when only()
is used, another related model is also prefetched, and that other model is mentioned _after_ restaurant
in the call to prefetch_related()
, the prefetched data for restaurant
isn't used.
Running macOS 12.7.4, Python 3.12.2 and Django 4.2.11.
Change History (4)
comment:1 by , 7 months ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
Type: | Uncategorized → New feature |
comment:2 by , 7 months ago
Cc: | added |
---|
I think there might more to it here given Building.restaurant
is a reverse one-to-one and thus there is no field stored on Building
to reference it and thus no field to include in the select mask (include in only
and excluded from exclude
) for prefetch_related
to work properly.
If .city
was used in the test (instead of .restaurant
) then it would have been a duplicate of #33835 (and something we could warn about like we do with select_related
+ only
misuse) as using prefetch_related("city")
requires that "city_id"
is included in the select mask but here it looks like something slightly different.
comment:3 by , 7 months ago
Ah I think this is actually a duplicate of #35044 which is only resolved in main
.
What is happening here is that in the third case restaurant
is prefetched and stored on the instance but then accessing city_id
to prefetch the right city
requires calling refresh_from_db
(as it's excluded from the select mask) then #35044 (which clears all prefetched data on each refresh_from_db
call) is triggered and the prefetched restaurant
instance is discarded causing the access to the property in the test to refetch from the database.
It works in the second case because the whole deferred city_id
query retrieval happens before the restaurant
caching happens to the refresh_from_db(fieds=["city_id"])
cache clear has to effect.
`
The only solution in 4.2 is to include city
in your select mask (like you'd want to do anyway to avoid a N+1) to avoid the bugged refresh_from_db
call. This makes be think that prefetch_related
+ only
/defer
misuse error like we do with select_related
might be valuable as opposed to silently including the fields as suggested by #35044.
comment:4 by , 7 months ago
Thanks for triaging this and finding a workaround/proper usage for my use case! 😊
Thank you for the report!
I think this is a duplicate of #33835 which is closed as 'wontfix' as the behaviour is intentional.