Opened 5 weeks ago

Closed 5 weeks ago

Last modified 4 weeks ago

#35376 closed New feature (duplicate)

Prefetched data not used when combining prefetch_related() and only()

Reported by: Michael Schwarz Owned by: nobody
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords:
Cc: Michael Schwarz, Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I think I found a simple case combining prefetch_related() and only(), where prefetched data isn't used when it should.

See the following model with a Restaurant being a Building, and a Building being part of a City. (replace Building with Business if it bugs you that a building can only contain a single restaurant 🙃, I noticed too late):

from django.db.models import DO_NOTHING
from django.db.models import ForeignKey
from django.db.models import Model
from django.db.models import OneToOneField
from django.db.models import TextField

class City(Model):
    name = TextField()

class Building(Model):
    city = ForeignKey(City, on_delete=DO_NOTHING)

    street_name = TextField()
    street_no = TextField()

class Restaurant(Model):
    building = OneToOneField(Building, on_delete=DO_NOTHING)

    name = TextField()

I'm trying to build a query set of buildings with some of its attributes deferred using only(), and the associated restaurants and cities being prefetched. In the following parametrized pytest test case, only the last instance is exhibiting the issue, the other 3 seem to work as I would expect.

The test case creates an instance of each model, runs the query (list(...)), and then accesses the restaurant attribute, which should be prefetched in every case. The test case check that the access does not generate an additional query using django_assert_num_queries() from pytest-django.

import pytest

from myproject.models import Building
from myproject.models import Restaurant
from myproject.models import City


@pytest.mark.parametrize('qs', [
    Building.objects.prefetch_related("restaurant", "city"),
    Building.objects.only("street_name").prefetch_related("restaurant"),
    Building.objects.only("street_name").prefetch_related("city", "restaurant"),
    Building.objects.only("street_name").prefetch_related("restaurant", "city")
])
def test_repro(db, django_assert_num_queries, qs):
    Restaurant.objects.create(
        name="",
        building=Building.objects.create(
            city=City.objects.create(name=""), street_name="", street_no=""
        ),
    )

    result = list(qs)

    with django_assert_num_queries(0):
        result[0].restaurant

The first 3 instances of the test above succeed, the last one fails:

$ venv/bin/pytest test_repro.py
[...]

    @pytest.mark.parametrize('qs', [
        Building.objects.prefetch_related("restaurant", "city"),
        Building.objects.only("street_name").prefetch_related("restaurant"),
        Building.objects.only("street_name").prefetch_related("city", "restaurant"),
        Building.objects.only("street_name").prefetch_related("restaurant", "city")
    ])
    def test_repro(db, django_assert_num_queries, qs):
        Restaurant.objects.create(
            name="",
            building=Building.objects.create(
                city=City.objects.create(name=""), street_name="", street_no=""
            ),
        )

        result = list(qs)

>       with django_assert_num_queries(0):

[...]

E               Failed: Expected to perform 0 queries but 1 was done (add -v option to show queries)

I've created a minimal project on GitHub documenting the exact setup, including all package versions.

AFAICT, in each case above, restaurant should be prefetched, and the attribute access should not generate an additional access. Only when only() is used, another related model is also prefetched, and that other model is mentioned _after_ restaurant in the call to prefetch_related(), the prefetched data for restaurant isn't used.

Running macOS 12.7.4, Python 3.12.2 and Django 4.2.11.

Change History (4)

comment:1 by Sarah Boyce, 5 weeks ago

Resolution: duplicate
Status: newclosed
Type: UncategorizedNew feature

Thank you for the report!
I think this is a duplicate of #33835 which is closed as 'wontfix' as the behaviour is intentional.

comment:2 by Simon Charette, 5 weeks ago

Cc: Simon Charette added

I think there might more to it here given Building.restaurant is a reverse one-to-one and thus there is no field stored on Building to reference it and thus no field to include in the select mask (include in only and excluded from exclude) for prefetch_related to work properly.

If .city was used in the test (instead of .restaurant) then it would have been a duplicate of #33835 (and something we could warn about like we do with select_related + only misuse) as using prefetch_related("city") requires that "city_id" is included in the select mask but here it looks like something slightly different.

Last edited 5 weeks ago by Simon Charette (previous) (diff)

comment:3 by Simon Charette, 5 weeks ago

Ah I think this is actually a duplicate of #35044 which is only resolved in main.

What is happening here is that in the third case restaurant is prefetched and stored on the instance but then accessing city_id to prefetch the right city requires calling refresh_from_db (as it's excluded from the select mask) then #35044 (which clears all prefetched data on each refresh_from_db call) is triggered and the prefetched restaurant instance is discarded causing the access to the property in the test to refetch from the database.

It works in the second case because the whole deferred city_id query retrieval happens before the restaurant caching happens to the refresh_from_db(fieds=["city_id"]) cache clear has to effect.
`
The only solution in 4.2 is to include city in your select mask (like you'd want to do anyway to avoid a N+1) to avoid the bugged refresh_from_db call. This makes be think that prefetch_related + only/defer misuse error like we do with select_related might be valuable as opposed to silently including the fields as suggested by #35044.

Last edited 5 weeks ago by Simon Charette (previous) (diff)

comment:4 by Michael Schwarz, 4 weeks ago

Thanks for triaging this and finding a workaround/proper usage for my use case! 😊

Note: See TracTickets for help on using tickets.
Back to Top