Opened 8 months ago

Closed 8 months ago

#34791 closed Bug (fixed)

Prefetch's queryset attribute is ignored when the same nested relation is used multiple times.

Reported by: Maxime Toussaint Owned by: Maxime Toussaint
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords: prefetch, prefetch_related
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Maxime Toussaint)

Note: Edited the description following the discussion

There seems to be an issue when using a Prefetch object to fetch something that has already been fetched. The issue only seems to happen when there is depth in the prefetch. Here is an example I made this morning that fails:

        pizzas = Pizza.objects.all().prefetch_related(
            "toppings__origin",
            Prefetch(
                "toppings__origin",
                queryset=Country.objects.filter(label="China"),
                to_attr="china",
            ),
        )

        china = pizzas[0].toppings.all()[0].china

Here, when trying to get china, I would assume it to either be a Country object or None. However, I get the message: AttributeError: 'Topping' object has no attribute 'china'

Here are the models I set up for my test:

class Country(models.Model):
    label = models.CharField(max_length=50)


class Pizza(models.Model):
    label = models.CharField(max_length=50)


class Topping(models.Model):
    pizza = models.ForeignKey(Pizza, on_delete=models.CASCADE, related_name="toppings")
    label = models.CharField(max_length=50)
    origin = models.ForeignKey(
        Country, on_delete=models.CASCADE, related_name="toppings"
    )

And here are the queries made when calling the queryset:

1. SELECT "tests_pizza"."id", "tests_pizza"."label", "tests_pizza"."provenance_id" FROM "tests_pizza"
2. SELECT "tests_topping"."id", "tests_topping"."pizza_id", "tests_topping"."label", "tests_topping"."origin_id" FROM "tests_topping" WHERE "tests_topping"."pizza_id" IN (1, 2)
3. SELECT "tests_country"."id", "tests_country"."label", "tests_country"."continent_id" FROM "tests_country" WHERE "tests_country"."id" IN (2, 3, 4)

Note that the filter by label='china' has completely disappeared.

Now, if I switch the prefetches around like so:

        pizzas = Pizza.objects.all().prefetch_related(
            Prefetch(
                "toppings__origin",
                queryset=Country.objects.filter(label="China"),
                to_attr="china",
            ),
            "toppings__origin",
        )

Fetching china now works, and here are the queries being made:

1. SELECT "tests_pizza"."id", "tests_pizza"."label", "tests_pizza"."provenance_id" FROM "tests_pizza"
2. SELECT "tests_topping"."id", "tests_topping"."pizza_id", "tests_topping"."label", "tests_topping"."origin_id" FROM "tests_topping" WHERE "tests_topping"."pizza_id" IN (1, 2)
3. SELECT "tests_country"."id", "tests_country"."label", "tests_country"."continent_id" FROM "tests_country" WHERE ("tests_country"."label" = 'China' AND "tests_country"."id" IN (2, 3, 4))
4. SELECT "tests_country"."id", "tests_country"."label", "tests_country"."continent_id" FROM "tests_country" WHERE "tests_country"."id" IN (2, 3, 4)

This time, both calls to Country were made.

I did follow the code a bit yesterday, and I believe it comes from the fact that the ForeignKey field defines the cache name as being simply the name of the field. I am not certain though, and it would likely require someone with more knowledge than me to look into it.

Change History (14)

comment:1 by Natalia Bidart, 8 months ago

Hello! Thank you for your report. Could you please post your models.py definition (reduced to this example)? Also it would be useful if you can paste the query each example generate.

This way, we can try to reproduce and triage accordingly. Thanks!

comment:2 by Natalia Bidart, 8 months ago

Resolution: worksforme
Status: newclosed

I have created a few models of my own and tested your queries. The queries are correct in both cases and I get expected results. I have created many toppings and onion and tomato are out of stock:

>>> def print_toppings(q):
...     print([t.name for p in q for t in p.toppings.all()], [t.name for p in q for t in p.out_of_stock_toppings])
...

>>> q1 = Pizza.objects.all().prefetch_related('toppings',Prefetch('toppings', queryset=Topping.objects.filter(is_in_stock=False), to_attr='out_of_stock_toppings'))

>>> print_toppings(q1)
['sweet pepper', 'egg', 'ham', 'bacon', 'onion', 'cheese', 'tomato', 'onion', 'sweet pepper', 'egg', 'cheese', 'onion', 'tomato'] ['onion', 'tomato', 'onion', 'onion', 'tomato']

>>> print("\n\n".join(i["sql"] for i in connection.queries))
SELECT "ticket_34791_pizza"."id", "ticket_34791_pizza"."name" FROM "ticket_34791_pizza"

SELECT ("ticket_34791_pizza_toppings"."pizza_id") AS "_prefetch_related_val_pizza_id", "ticket_34791_topping"."id", "ticket_34791_topping"."name", "ticket_34791_topping"."is_in_stock" FROM "ticket_34791_topping" INNER JOIN "ticket_34791_pizza_toppings" ON ("ticket_34791_topping"."id" = "ticket_34791_pizza_toppings"."topping_id") WHERE "ticket_34791_pizza_toppings"."pizza_id" IN (2, 3, 4, 1)

SELECT ("ticket_34791_pizza_toppings"."pizza_id") AS "_prefetch_related_val_pizza_id", "ticket_34791_topping"."id", "ticket_34791_topping"."name", "ticket_34791_topping"."is_in_stock" FROM "ticket_34791_topping" INNER JOIN "ticket_34791_pizza_toppings" ON ("ticket_34791_topping"."id" = "ticket_34791_pizza_toppings"."topping_id") WHERE (NOT "ticket_34791_topping"."is_in_stock" AND "ticket_34791_pizza_toppings"."pizza_id" IN (2, 3, 4, 1))

The second query also works fine:

>>> q2 = Pizza.objects.all().prefetch_related(Prefetch('toppings', queryset=Topping.objects.filter(is_in_stock=False), to_attr='out_of_stock_toppings'),'toppings')

>>> print_toppings(q2)
['sweet pepper', 'egg', 'ham', 'bacon', 'onion', 'cheese', 'tomato', 'onion', 'sweet pepper', 'egg', 'cheese', 'onion', 'tomato'] ['onion', 'tomato', 'onion', 'onion', 'tomato']

>>> print("\n\n".join(i["sql"] for i in connection.queries))
SELECT "ticket_34791_pizza"."id", "ticket_34791_pizza"."name" FROM "ticket_34791_pizza"

SELECT ("ticket_34791_pizza_toppings"."pizza_id") AS "_prefetch_related_val_pizza_id", "ticket_34791_topping"."id", "ticket_34791_topping"."name", "ticket_34791_topping"."is_in_stock" FROM "ticket_34791_topping" INNER JOIN "ticket_34791_pizza_toppings" ON ("ticket_34791_topping"."id" = "ticket_34791_pizza_toppings"."topping_id") WHERE (NOT "ticket_34791_topping"."is_in_stock" AND "ticket_34791_pizza_toppings"."pizza_id" IN (2, 3, 4, 1))

SELECT ("ticket_34791_pizza_toppings"."pizza_id") AS "_prefetch_related_val_pizza_id", "ticket_34791_topping"."id", "ticket_34791_topping"."name", "ticket_34791_topping"."is_in_stock" FROM "ticket_34791_topping" INNER JOIN "ticket_34791_pizza_toppings" ON ("ticket_34791_topping"."id" = "ticket_34791_pizza_toppings"."topping_id") WHERE "ticket_34791_pizza_toppings"."pizza_id" IN (2, 3, 4, 1)

You should note that when using to_attr, the result is stored in a list as documented here (see the Note): https://docs.djangoproject.com/en/4.2/ref/models/querysets/#prefetch-objects

comment:3 by Maxime Toussaint, 8 months ago

Hi, thanks a lot for the quick answer. I seem to have misunderstood the parameters of my issue, and the example I gave does work for me. In trying to simplify it, I took out the part that was making it fail.

So the issue only seems to happen when there is depth in the prefetch. Here is an example I made this morning that fails:

        pizzas = Pizza.objects.all().prefetch_related(
            "toppings__origin",
            Prefetch(
                "toppings__origin",
                queryset=Country.objects.filter(label="China"),
                to_attr="china",
            ),
        )

        china = pizzas[0].toppings.all()[0].china

Here, when trying to get china, I would assume it to either be a Country object or None. However, I get the message: AttributeError: 'Topping' object has no attribute 'china'

Here are the models I set up for my test:

class Country(models.Model):
    label = models.CharField(max_length=50)


class Pizza(models.Model):
    label = models.CharField(max_length=50)


class Topping(models.Model):
    pizza = models.ForeignKey(Pizza, on_delete=models.CASCADE, related_name="toppings")
    label = models.CharField(max_length=50)
    origin = models.ForeignKey(
        Country, on_delete=models.CASCADE, related_name="toppings"
    )

And here are the queries made when calling the queryset:

1. SELECT "tests_pizza"."id", "tests_pizza"."label", "tests_pizza"."provenance_id" FROM "tests_pizza"
2. SELECT "tests_topping"."id", "tests_topping"."pizza_id", "tests_topping"."label", "tests_topping"."origin_id" FROM "tests_topping" WHERE "tests_topping"."pizza_id" IN (1, 2)
3. SELECT "tests_country"."id", "tests_country"."label", "tests_country"."continent_id" FROM "tests_country" WHERE "tests_country"."id" IN (2, 3, 4)

Note that the filter by label='china' has completely disappeared.

Now, if I switch the prefetches around like so:

        pizzas = Pizza.objects.all().prefetch_related(
            Prefetch(
                "toppings__origin",
                queryset=Country.objects.filter(label="China"),
                to_attr="china",
            ),
            "toppings__origin",
        )

Fetching china now works, and here are the queries being made:

1. SELECT "tests_pizza"."id", "tests_pizza"."label", "tests_pizza"."provenance_id" FROM "tests_pizza"
2. SELECT "tests_topping"."id", "tests_topping"."pizza_id", "tests_topping"."label", "tests_topping"."origin_id" FROM "tests_topping" WHERE "tests_topping"."pizza_id" IN (1, 2)
3. SELECT "tests_country"."id", "tests_country"."label", "tests_country"."continent_id" FROM "tests_country" WHERE ("tests_country"."label" = 'China' AND "tests_country"."id" IN (2, 3, 4))
4. SELECT "tests_country"."id", "tests_country"."label", "tests_country"."continent_id" FROM "tests_country" WHERE "tests_country"."id" IN (2, 3, 4)

This time, both calls to Country were made.

I did follow the code a bit yesterday, and I believe it comes from the fact that the ForeignKey field defines the cache name as being simply the name of the field. I am not certain though, and it would likely require someone with more knowledge than me to look into it.

Thanks again, hopefully we can figure out what is happening!

comment:4 by Maxime Toussaint, 8 months ago

Resolution: worksforme
Status: closednew

comment:5 by Maxime Toussaint, 8 months ago

Description: modified (diff)

comment:6 by Maxime Toussaint, 8 months ago

Reopened because I managed to create a test that shows the issue.

comment:7 by Mariusz Felisiak, 8 months ago

I did follow the code a bit yesterday, and I believe it comes from the fact that the ForeignKey field defines the cache name as being simply the name of the field. I am not certain though, and it would likely require someone with more knowledge than me to look into it.

As far as I'm aware this is an expected behavior, subsequent lookups in prefetch_related() affect each other, see docs:

"The ordering of lookups matters."

You prefetched toppings__origin with a default queryset, so the second Prefetch() ignores a custom queryset on the same relation. You should use Prefetch(..., to_attr="") in both cases to have two independent queries.

In such cases, we may consider raising ValueError.

comment:8 by Mariusz Felisiak, 8 months ago

Summary: Issue when using Prefetch objects in prefetch_relatedPrefetch's queryset attribute is ignored when the same nested relation is used multiple times.
Triage Stage: UnreviewedAccepted

Tentatively accepted.

comment:9 by Maxime Toussaint, 8 months ago

So, it makes sense that the order matters, especially for Prefetch with to_attr, since until it has been fetched, it does not exist yet. Here though, these are 2 very separate calls to the database that have absolutely no interaction with each other. Shouldn't they be independent? It doesn't make sense to me that it works in one order but it does not in the other. I am vey surprised to hear that such a thing could be the intended behaviour, as I can't make sense of why it could be. If you could give more depth to your reasoning, it would be much appreciated.

comment:10 by Mariusz Felisiak, 8 months ago

I am vey surprised to hear that such a thing could be the intended behaviour, as I can't make sense of why it could be. If you could give more depth to your reasoning, it would be much appreciated.

I changed my mind and agreed that we should fix it, that's why I accepted this ticket.

comment:11 by Maxime Toussaint, 8 months ago

Owner: changed from nobody to Maxime Toussaint
Status: newassigned

comment:13 by Mariusz Felisiak, 8 months ago

Triage Stage: AcceptedReady for checkin

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 8 months ago

Resolution: fixed
Status: assignedclosed

In 254df3a:

Fixed #34791 -- Fixed incorrect Prefetch()'s cache for singly related objects.

Changed the cache name used for singly related objects to be the
to_attr parameter passed to a Prefetch object. This fixes issues with
checking if values have already been fetched in cases where the Field
already has some prefetched value, but not for the same model attr.

Note: See TracTickets for help on using tickets.
Back to Top