Opened 5 weeks ago

Closed 5 weeks ago

Last modified 5 weeks ago

#36116 closed Cleanup/optimization (fixed)

prefetch_related() selects too many objects from a related table with a composite primary key and hidden related name

Reported by: Jacob Walls Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 5.2
Severity: Release blocker Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Noticed a "fixme" comment in the code that pointed me here. With a hidden related name, e.g. comments+, we hit a branch that expects primary keys to consist of one column only. When a composite primary key violates that assumption, more objects than necessary will be fetched.

With this test:

  • tests/composite_pk/models/__init__.py

    diff --git a/tests/composite_pk/models/__init__.py b/tests/composite_pk/models/__init__.py
    index 5996ae33b0..40750d91ae 100644
    a b  
    1 from .tenant import Comment, Post, Tenant, TimeStamped, Token, User
     1from .tenant import (
     2    Comment,
     3    CommentWithHiddenUserField,
     4    Post,
     5    Tenant,
     6    TimeStamped,
     7    Token,
     8    User,
     9)
    210
    311__all__ = [
    412    "Comment",
     13    "CommentWithHiddenUserField",
    514    "Post",
    615    "Tenant",
    716    "TimeStamped",
  • tests/composite_pk/models/tenant.py

    diff --git a/tests/composite_pk/models/tenant.py b/tests/composite_pk/models/tenant.py
    index 6286ed2354..66fb371d3a 100644
    a b class Comment(models.Model):  
    4646    text = models.TextField(default="", blank=True)
    4747
    4848
     49class CommentWithHiddenUserField(models.Model):
     50    pk = models.CompositePrimaryKey("tenant", "id")
     51    tenant = models.ForeignKey(
     52        Tenant,
     53        on_delete=models.CASCADE,
     54        related_name="comments+",
     55    )
     56    id = models.SmallIntegerField(unique=True, db_column="comment_id")
     57    user_id = models.SmallIntegerField()
     58    user = models.ForeignObject(
     59        User,
     60        on_delete=models.CASCADE,
     61        from_fields=("tenant_id", "user_id"),
     62        to_fields=("tenant_id", "id"),
     63        related_name="comments+",
     64    )
     65    text = models.TextField(default="", blank=True)
     66
     67
    4968class Post(models.Model):
    5069    pk = models.CompositePrimaryKey("tenant_id", "id")
    5170    tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE, default=1)
  • tests/composite_pk/tests.py

    diff --git a/tests/composite_pk/tests.py b/tests/composite_pk/tests.py
    index 6b09480fb0..29da1be47b 100644
    a b from django.db import IntegrityError, connection  
    1616from django.db.models import CompositePrimaryKey
    1717from django.forms import modelform_factory
    1818from django.test import TestCase
     19from django.test.utils import CaptureQueriesContext
    1920
    20 from .models import Comment, Post, Tenant, TimeStamped, User
     21from .models import Comment, CommentWithHiddenUserField, Post, Tenant, TimeStamped, User
    2122
    2223
    2324class CommentForm(forms.ModelForm):
    class CompositePKTests(TestCase):  
    3839            email="user0001@example.com",
    3940        )
    4041        cls.comment = Comment.objects.create(tenant=cls.tenant, id=1, user=cls.user)
     42        cls.comment_by_hidden_user = CommentWithHiddenUserField.objects.create(
     43            tenant=cls.tenant, id=1, user=cls.user
     44        )
    4145
    4246    @staticmethod
    4347    def get_primary_key_columns(table):
    class CompositePKTests(TestCase):  
    157161        result = list(Comment.objects.iterator())
    158162        self.assertEqual(result, [self.comment])
    159163
     164    def test_prefetch_related_hidden_related_name(self):
     165        with CaptureQueriesContext(connection) as queries:
     166            CommentWithHiddenUserField.objects.prefetch_related("user").get(
     167                pk=self.comment_by_hidden_user.pk
     168            )
     169        self.assertIn('"comment_id" = 1', queries[1]["sql"])
     170
    160171    def test_query(self):
    161172        users = User.objects.values_list("pk").order_by("pk")
    162173        self.assertNotIn('AS "pk"', str(users.query))

The WHERE is too permissive -- would expect both primary key columns to be checked, i.e. also "comment_id".

======================================================================
FAIL: test_prefetch_related_hidden_related_name (composite_pk.tests.CompositePKTests.test_prefetch_related_hidden_related_name)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/source/django/tests/composite_pk/tests.py", line 169, in test_prefetch_related_hidden_related_name
    self.assertIn('"comment_id" = 1', queries[1]["sql"])
AssertionError: '"comment_id" = 1' not found in 'SELECT "composite_pk_user"."tenant_id", "composite_pk_user"."id", "composite_pk_user"."email" FROM "composite_pk_user" WHERE "composite_pk_user"."tenant_id" IN (1)'

----------------------------------------------------------------------
Ran 1 test in 0.006s

FAILED (failures=1)

Change History (9)

comment:1 by Simon Charette, 5 weeks ago

Cc: Simon Charette added
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Thanks for surfacing Jacob, glad we foresaw this one in cb83448891f2920c926f61826d8eae4051a3d8f2.

Tentatively elevating to release blocker as we've opted to start documenting ForeignObject's usage with CompositePrimaryKey and I would expect users to run into this problem as well.

comment:2 by Simon Charette, 5 weeks ago

Has patch: set
Owner: set to Simon Charette
Status: newassigned

Going to assign since I got a patch ready that doesn't require introducing new models and we'll surely see many of these required adjustments until 5.0 gets fully released. Let me know if you want me to hold off for longer when you report these.

comment:3 by Sarah Boyce, 5 weeks ago

The WHERE is too permissive -- would expect both primary key columns to be checked, i.e. also "comment_id".

Just to clarify, agreed the value of comment_id should be in the SQL but not a reference to the column comment_id (e.g. WHERE ("composite_pk_user"."tenant_id" = 1 AND "composite_pk_user"."id" = 1))

comment:4 by Sarah Boyce, 5 weeks ago

Patch needs improvement: set

comment:5 by Simon Charette, 5 weeks ago

Clearing patch needs improvement flag as the prefectch_related + SQLite issue for large number of items is already tracked in #27833.

It just happens that number of parents records is a function of the number of fields 1000 / len(fields).

comment:6 by Simon Charette, 5 weeks ago

Patch needs improvement: unset

comment:7 by Jacob Walls, 5 weeks ago

Triage Stage: AcceptedReady for checkin

comment:8 by Sarah Boyce <42296566+sarahboyce@…>, 5 weeks ago

Resolution: fixed
Status: assignedclosed

In 626d77e5:

Fixed #36116 -- Optimized multi-column ForwardManyToOne prefetching.

Rely on ColPairs and TupleIn which support a single column to be specified
to avoid special casing ForwardManyToOne.get_prefetch_querysets().

Thanks Jacob Walls for the report.

comment:9 by Sarah Boyce <42296566+sarahboyce@…>, 5 weeks ago

In 9861e86:

[5.2.x] Fixed #36116 -- Optimized multi-column ForwardManyToOne prefetching.

Rely on ColPairs and TupleIn which support a single column to be specified
to avoid special casing ForwardManyToOne.get_prefetch_querysets().

Thanks Jacob Walls for the report.

Backport of 626d77e52a3f247358514bcf51c761283968099c from main.

Note: See TracTickets for help on using tickets.
Back to Top