Opened 3 weeks ago

Closed 3 weeks ago

Last modified 3 weeks ago

#36764 closed Bug (duplicate)

QuerySet.only() causes n+1 queries with reverse foreign key relation

Reported by: Bernhard Vallant Owned by:
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: queryset, only, defered
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Bernhard Vallant)

To reproduce I tried adding the following test to tests.defer.tests.DeferTests:

def test_only_reverse_fk(self):
        qs = self.s1.primary_set.only("name")
        with self.assertNumQueries(1):
            for primary in qs:
                primary.name

This fails because it produces an additional query for every Primary instance to just fetch related_id:

test_only_reverse_fk (defer.tests.DeferTests.test_only_reverse_fk) failed:

    AssertionError('3 != 1 : 3 queries executed, 1 expected
    Captured  queries were:
        1. SELECT "defer_primary"."id", "defer_primary"."name" FROM "defer_primary" WHERE "defer_primary"."related_id" = 1
        2. SELECT "defer_primary"."id", "defer_primary"."related_id" FROM "defer_primary"  WHERE "defer_primary"."id" = 1 LIMIT 21
        3. SELECT "defer_primary"."id", "defer_primary"."related_id" FROM "defer_primary"  WHERE "defer_primary"."id" = 2 LIMIT 21
    ')

When replacing the queryset with qs = Primary.objects.filter(related_id=self.s1.pk).only("name") basically the same SQL is produced, but no additional queries.
I think the additional queries are quite an unexpected behaviour and it is unnecessary to retrieve related_id for every row. Somehow this is also a dangerous behaviour as it can cause n + 1 queries when actually trying to optimize your query.

If this behaviour should be the expected one it should at least be documented and tested somewhere (I don't think there are tests for only() when using relations like this somewhere).

Change History (7)

comment:1 by Clifford Gama, 3 weeks ago

Thanks for the report! FWIW, this works:

    def test_only_reverse_fk(self):
        qs = self.s1.primary_set.only("name", "related_id")
        with self.assertNumQueries(1):
            for primary in qs:
                primary.name

and so does .values("name").

Last edited 3 weeks ago by Clifford Gama (previous) (diff)

comment:2 by Bernhard Vallant, 3 weeks ago

Description: modified (diff)

comment:3 by Youngkwang Yang, 3 weeks ago


  Product.objects.all().delete()
  Category.objects.all().delete()
  category = Category.objects.create(name="Test")
  Product.objects.create(name="P1", price=10, category=category)
  Product.objects.create(name="P2", price=20, category=category)

  # Test
  qs = category.product_set.only("name")
  print(f"_known_related_objects: {qs._known_related_objects}")

  reset_queries()
  for p in qs:
      p.name  # only accessing 'name', but triggers extra queries

  print(f"\nQueries executed: {len(connection.queries)}")
  for q in connection.queries:
      print(f"  {q['sql']}")

output:

  _known_related_objects: {<ForeignKey: category>: {10011: <Category: Test>}}

  Queries executed: 3
    SELECT "product"."id", "product"."name" FROM "product" WHERE "product"."related_id" = 10011
    SELECT "product"."id", "product"."related_id" FROM "product" WHERE "product"."id" = 100020 LIMIT 21
    SELECT "product"."id", "product"."related_id" FROM "product" WHERE "product"."id" = 100021 LIMIT 21

this problem seems related to _known_related_objects in RelatedManager._apply_rel_filters(), which causes Django to access the deferred related_id on each instance.

comment:4 by Bernhard Vallant, 3 weeks ago

I could also tracked it down to the id field being in _known_related_objects. For me the question is if the proper solution in this case would be to automatically append the field to the "only" fields (if it is necessary for the relations to properly work) or if it can be omitted in a way from _known_related_objects.

comment:5 by Simon Charette, 3 weeks ago

For me the question is if the proper solution in this case would be to automatically append the field to the "only" fields (if it is necessary for the relations to properly work) or if it can be omitted in a way from _known_related_objects.

I can't find it right now but there's another ticket that disuss doing that and why that might be problematic. I'll keep searching.

Last edited 3 weeks ago by Simon Charette (previous) (diff)

comment:6 by Simon Charette, 3 weeks ago

I think we can close this one as duplicate of #35442 and #30124 which I'll rename to specify it applies to all cases where the related id is not included.

Last edited 3 weeks ago by Simon Charette (previous) (diff)

comment:7 by Simon Charette, 3 weeks ago

Resolution: duplicate
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top