Opened 18 months ago

Last modified 3 months ago

#35442 assigned Bug

N+1 queries from RelatedManager + only("pk")

Reported by: REGNIER Guillaume Owned by: Rish
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When iterating over a queryset constructed from a RelatedManager and a .only(...) call that does not include the related field, a query occurs when instances are produced from the queryset.

Steps to Reproduce:

class Company(models.Model):
    pass

class Employee(models.Model):
    company = models.ForeignKey(Company, on_delete=models.CASCADE, related_name="employees")
company = Company.objects.create()
Employee.objects.bulk_create(Employee(company=company) for _ in range(10))

for employee in company.employees.only("pk"):
    # Some code that only access pk
    _ = employee.pk

Expected Behavior:

One query like

SELECT "employee"."id" FROM "employee" WHERE "employee"."company_id" = {COMPANY_ID}

Actual Behavior:

10 additional queries like:

SELECT "employee"."id", "employee"."company_id" FROM "employee" WHERE "employee"."id" = {EMPLOYEE_ID}

Analysis:

My understanding is that there is an optimization that fills the parent model on related instances without needing additional SQL join/query. However, when only a subset of fields is selected (in this case, only the primary key), the parent ID might not be loaded from the database, resulting in additional queries to perform said optimization.

Workaround:

company = Company.objects.create()
Employee.objects.bulk_create(Employee(company=company) for _ in range(10))

for employee in Employee.objects.filter(company=company).only("pk"):
    # Some code that only access pk
    _ = employee.pk

Change History (14)

comment:1 by Simon Charette, 18 months ago

This relates to #20927, #18177, and at least another ticket that asked for "company_id" to be implicitly included in the only call that I just can't find anymore.

I think we should try to resolve that in a different way and a potential solution could be something like

  • django/db/models/query.py

    diff --git a/django/db/models/query.py b/django/db/models/query.py
    index cb5c63c0d1..9390d2242d 100644
    a b def __iter__(self):  
    119119                ),
    120120            )
    121121            for field, related_objs in queryset._known_related_objects.items()
     122            # if not is_deferred(field)
    122123        ]
    123124        for row in compiler.results_iter(results):
    124125            obj = model_cls.from_db(

But in order to work properly it would require a refactor of known_related_objects as the current approach is naive wrt/ to how it namespace related objects by field as the same model field can be included multiple times in the the same queryset (e.g. see #35356)

Last edited 18 months ago by Natalia Bidart (previous) (diff)

comment:2 by Natalia Bidart, 18 months ago

Hello Guillaume!

I'm having a hard time understanding the goal of your code. I assume you have provided a simplified version to be used as a small example, which I appreciate, but with the information given, the proper way to fetch records from the database without generating N+1 queries would be:

>>> from django.db import connection, reset_queries
>>> reset_queries(); connection.queries
[]
>>> company = Company.objects.prefetch_related("employees").last()
>>> for e in company.employees.all():
>>>     print(e.pk)
7
8
9
10
>>> connection.queries
[{'sql': 'SELECT "ticket_35442_company"."id", "ticket_35442_company"."name" FROM "ticket_35442_company" ORDER BY "ticket_35442_company"."id" DESC LIMIT 1',
  'time': '0.002'},
 {'sql': 'SELECT "ticket_35442_employee"."id", "ticket_35442_employee"."company_id" FROM "ticket_35442_employee" WHERE "ticket_35442_employee"."company_id" IN (5)',
  'time': '0.001'}]

Or, my preferred (a single query and the same workaround as was posted originally):

>>> for e in Employee.objects.filter(company=company).only("pk"): print(e.pk)
7
8
9
10
>>> connection.queries
[{'sql': 'SELECT "ticket_35442_employee"."id" FROM "ticket_35442_employee" WHERE "ticket_35442_employee"."company_id" = 5',
  'time': '0.000'}]
Last edited 18 months ago by Natalia Bidart (previous) (diff)

comment:3 by Natalia Bidart, 18 months ago

Hey Simon, thank you for your message. I fail to see how Django is at fault here, considering that there are alternative usages of the ORM and its queries to accomplish the same result (as far as we understand the use case) that would not execute N+1 queries. This seems like a (likely?) dupe of #20923 which was closed as wontfix.

comment:4 by Natalia Bidart, 18 months ago

Resolution: duplicate
Status: newclosed

I'll close as duplicate of #20923 for now following the ticket triaging process, but any further commenting is welcomed and I'm happy to reconsider the ticket status with more information.

comment:5 by REGNIER Guillaume, 18 months ago

Thank you both for answering!

Yes, i did provide a minimal exemple. Here i just wanted to use company.employees.all() instead of Employee.objects.filter(company=company).
I often use RelatedManagers this way to avoid importing the related model and i do find the final code easier to read.
I thought the two pieces of code above were meant to be equivalent.

I might have understood the RelatedManager wrong but when i'm passing a queryset along, i have no way to know if it originated from a RelatedManager or a regular Manager, meaning i have no way to know if .only() is safe to use or if it'll cause N+1 queries.

If this behavior is intended/not an issue, it means that i either need to ban .only() or RelatedManagers from my coding habbits since i don't have any way to ensure this won't happen again.

comment:6 by Simon Charette, 18 months ago

Accessing objects though a related manager should still allow for only to work as expected. The fact that the ORM doesn't even warn you when it silently issues queries on field deferral leaks (#22492) makes this behavior really insidious and prevents safe usage of related managers as Guillaume brought up.

I think the the ORM should either have RelatedManager.only include the reverse field implicitly (which we've kind of rules out against in #33835) or we should find a way to more safely assign known related objects in the face of deferred fields (e.g. make sure required attrs are in obj.__dict__).

Potentially naive patch

  • django/db/models/query.py

    diff --git a/django/db/models/query.py b/django/db/models/query.py
    index cb5c63c0d1..883a7a71e4 100644
    a b def __iter__(self): (this hunk was shorter than expected)  
    107107            (
    108108                field,
    109109                related_objs,
    110                 operator.attrgetter(
    111                     *[
    112                         (
    113                             field.attname
    114                             if from_field == "self"
    115                             else queryset.model._meta.get_field(from_field).attname
    116                         )
    117                         for from_field in field.from_fields
    118                     ]
    119                 ),
     110                attnames := [
     111                    (
     112                        field.attname
     113                        if from_field == "self"
     114                        else queryset.model._meta.get_field(from_field).attname
     115                    )
     116                    for from_field in field.from_fields
     117                ],
     118                operator.attrgetter(*attnames),
    120119            )
    121120            for field, related_objs in queryset._known_related_objects.items()
    122121        ]
    123122        for row in compiler.results_iter(results):
    124123            obj = model_cls.from_db(
    def __iter__(self):  
    131131                    setattr(obj, attr_name, row[col_pos])
    132132
    133133            # Add the known related objects to the model.
    134             for field, rel_objs, rel_getter in known_related_objects:
     134            for field, rel_objs, rel_attnames, rel_getter in known_related_objects:
    135135                # Avoid overwriting objects loaded by, e.g., select_related().
    136136                if field.is_cached(obj):
    137137                    continue
     138                # Avoid fetching potentially deferred attributes that would
     139                # result in unexpected queries.
     140                if any(attname not in obj.__dict__ for attname in rel_attnames):
     141                    continue
    138142                rel_obj_id = rel_getter(obj)
    139143                try:
    140144                    rel_obj = rel_objs[rel_obj_id]
Last edited 18 months ago by Simon Charette (previous) (diff)

comment:7 by Simon Charette, 18 months ago

So I tried running the suite against the above patch and it delivered!

I was pointed at a single failure being defer.tests.DeferTests.test_only

======================================================================
FAIL: test_only (defer.tests.DeferTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/charettes/Workspace/django/tests/defer/tests.py", line 52, in test_only
    self.assert_delayed(self.s1.primary_set.only("pk")[0], 2)
  File "/Users/charettes/Workspace/django/tests/defer/tests.py", line 24, in assert_delayed
    self.assertEqual(count, num)
AssertionError: 3 != 2

What's interesting though is that it might have a been a sneaky regression caused by the refactor to how field deferral is implemented that flew under the radar.

It was not discussed at all in the two PRs where #26207 was resolved

Version 0, edited 18 months ago by Simon Charette (next)

comment:8 by Simon Charette, 18 months ago

Cc: Simon Charette added

comment:9 by Simon Charette, 18 months ago

For the record, I checked out I checked out Django 1.9 and it seems the issue is present there so while the commit mentioned above that landed in 1.10 did alter a related test it didn't actually break anything in this regard.

comment:10 by Sarah Boyce, 18 months ago

Resolution: duplicate
Status: closednew
Triage Stage: UnreviewedAccepted

Thank you for the investigation Simon, reopening 👍

comment:11 by Sarah Boyce, 18 months ago

Type: UncategorizedBug

comment:12 by Rish, 16 months ago

Owner: changed from nobody to Rish
Status: newassigned

in reply to:  1 ; comment:13 by Jacob Walls, 3 months ago

Replying to Simon Charette:

another ticket that asked for "company_id" to be implicitly included in the only call that I just can't find anymore.

Could it be #30124?

in reply to:  13 comment:14 by Simon Charette, 3 months ago

Replying to Jacob Walls:

Replying to Simon Charette:

another ticket that asked for "company_id" to be implicitly included in the only call that I just can't find anymore.

Could it be #30124?

Yep it was, thanks for the sleuthing Jacob!

Note: See TracTickets for help on using tickets.
Back to Top