Opened 3 days ago

Last modified 16 hours ago

#36282 assigned Bug

Prefetched relations are not used from parent classes in all cases

Reported by: Take Weiland Owned by: Take Weiland
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords: mti prefetch_related
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no
Pull Requests:How to create a pull request

Description

When using prefetch_related to prefetch relations, Django makes some attempt to use prefetched data present in parent models, but it is very limited. It is only implemented for ForeignKey and only works for the immediate parent.

Assuming this model hierarchy:

class Related(models.Model):
    pass


class GrandParent(models.Model):
    name = models.CharField(max_length=50)
    gp_fk = models.ForeignKey(Related, null=True, on_delete=models.CASCADE, related_name='gp_fk_rel')
    gp_m2m = models.ManyToManyField(Related, related_name='gp_m2m_rel')


class Parent(GrandParent):
    pass


class Child(Parent):
    pass

I have written these test cases:

class PrefetchRelatedWorksWithInheritance(TestCase):

    @classmethod
    def setUpTestData(cls):
        cls.related1 = Related.objects.create()
        cls.related2 = Related.objects.create()
        cls.related3 = Related.objects.create()

        cls.child = Child.objects.create(
            gp_fk=cls.related1,
        )
        cls.m2m_child = Child.objects.create()
        cls.m2m_child.gp_m2m.set([cls.related1, cls.related2, cls.related3])

    def test_parent_fk_available_in_child(self):
        qs = GrandParent.objects.select_related('parent').prefetch_related(
            'gp_fk'
        ).filter(pk=self.child.pk)
        with self.assertNumQueries(2):
            results = list(qs)
            self.assertEqual(len(results), 1)
            # Works, Parent can look into its GrandParent and find the prefetched data
            self.assertEqual(results[0].parent.gp_fk, self.related1)

    def test_grandparent_fk_available_in_child(self):
        qs = GrandParent.objects.select_related('parent', 'parent__child').prefetch_related(
            'gp_fk'
        ).filter(pk=self.child.pk)
        with self.assertNumQueries(2):
            results = list(qs)
            self.assertEqual(len(results), 1)
            # Causes extra query, Child only looks in Parent, not in GrandParent
            self.assertEqual(results[0].parent.child.gp_fk, self.related1)

    def test_parent_m2m_available_in_child(self):
        qs = GrandParent.objects.select_related('parent').prefetch_related(
            'gp_m2m'
        ).filter(pk=self.m2m_child.pk)
        with self.assertNumQueries(2):
            results = list(qs)
            self.assertEqual(len(results), 1)
            # Causes extra query, M2M never looks in its parents
            self.assertEqual(set(results[0].parent.gp_m2m.all()), {self.related1, self.related2, self.related3})

    def test_grandparent_m2m_available_in_child(self):
        qs = GrandParent.objects.select_related('parent', 'parent__child').prefetch_related(
            'gp_m2m'
        ).filter(pk=self.m2m_child.pk)
        with self.assertNumQueries(2):
            results = list(qs)
            self.assertEqual(len(results), 1)
            # Causes extra query, M2M never looks in its parents
            self.assertEqual(set(results[0].parent.child.gp_m2m.all()), {self.related1, self.related2, self.related3})

Only the first of the tests passes.
For ForeignKeys the reason is here: https://github.com/django/django/blob/c3a23aa02faa1cf1d32e43d66858e793cd9ecac4/django/db/models/fields/related_descriptors.py#L229-L240
This only looks into the immediate parent, not any grandparents.

For ManyToManyField the culprit is here: https://github.com/django/django/blob/c3a23aa02faa1cf1d32e43d66858e793cd9ecac4/django/db/models/fields/related_descriptors.py#L1094-L1098
It only looks in the instance's _prefetched_objects_cache, when it could walk up to its parents, just like ForwardManyToOneDescriptor does.

I stumbled upon this when implementing support for prefetch_related in Django-Model-Utils' InheritanceManager. It automatically returns subclasses form a custom QuerySet iterable. But those subclass-instances then do not see the prefetched data of their parent, so in my fix I had to manually copy the caches, so that Django would find them.

I would be willing to work on a patch to make this work out of the box in Django.

According to the ticket's flags, the next step(s) to move this issue forward are:

  • To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is: [https://github.com/django/django/pull/#### PR].

Change History (7)

comment:1 by Take Weiland, 3 days ago

Summary: Prefetched relations are not available used from parent classes in all casesPrefetched relations are not used from parent classes in all cases

comment:2 by Simon Charette, 2 days ago

Component: UncategorizedDatabase layer (models, ORM)
Keywords: mti prefetch_related added
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Interestingly I ran into the same issue implementing similar logic 5 years ago.

If you're interested in solving this issue I'd have a few pointers

  1. Avoid copy'ing _prefetched_objects_cache related cache between model instances as it will make cache invalidation a nightmare (the workaround I linked above is flawed in this regard). The logic should be adapted to fully walk the ancestor chain. In other words, the cached value should remain solely on the model hosting the field and subclasses accessors should retrieve them from there.
  2. When adding tests to tests/prefetch_related reuse the existing models. It seems that AuthorWithAge is a good candidate for subclassing.
  3. The logic should always ensure that ancestor links are cached before accessing their parent (it's not always the case when field deferral is used)

Here's a little patch I used to validate your report if it can be handy

  • TabularUnified django/db/models/fields/related_descriptors.py

    diff --git a/django/db/models/fields/related_descriptors.py b/django/db/models/fields/related_descriptors.py
    index 77ebe798b4..36e8798340 100644
    a b def __get__(self, instance, cls=None):  
    226226            rel_obj = self.field.get_cached_value(instance)
    227227        except KeyError:
    228228            has_value = None not in self.field.get_local_related_value(instance)
     229            field_model = self.field.model
    229230            ancestor_link = (
    230                 instance._meta.get_ancestor_link(self.field.model)
    231                 if has_value
    232                 else None
     231                instance._meta.get_ancestor_link(field_model) if has_value else None
    233232            )
    234             if ancestor_link and ancestor_link.is_cached(instance):
    235                 # An ancestor link will exist if this field is defined on a
    236                 # multi-table inheritance parent of the instance's class.
    237                 ancestor = ancestor_link.get_cached_value(instance)
    238                 # The value might be cached on an ancestor if the instance
    239                 # originated from walking down the inheritance chain.
    240                 rel_obj = self.field.get_cached_value(ancestor, default=None)
    241             else:
    242                 rel_obj = None
     233            rel_obj = None
     234            # An ancestor link will exist if this field is defined on a
     235            # multi-table inheritance parent of the instance's class.
     236            if ancestor_link:
     237                ancestor = instance
     238                while ancestor_link.is_cached(ancestor):
     239                    ancestor = ancestor_link.get_cached_value(ancestor)
     240                    if type(ancestor) is field_model:
     241                        # The value might be cached on an ancestor if the instance
     242                        # originated from walking down the inheritance chain.
     243                        rel_obj = self.field.get_cached_value(ancestor, default=None)
     244                    else:
     245                        ancestor_link = ancestor._meta.get_ancestor_link(field_model)
    243246            if rel_obj is None and has_value:
    244247                rel_obj = self.get_object(instance)
    245248                remote_field = self.field.remote_field

in reply to:  2 comment:3 by Adya, 23 hours ago

Replying to Simon Charette:
Hi, can I work on it?

Kindly assign it to me!

Last edited 23 hours ago by Adya (previous) (diff)

comment:4 by Simon Charette, 23 hours ago

The reporter explicitly stated

I would be willing to work on a patch to make this work out of the box in Django.

So I would give it a few more days before assigning the ticket to you Adya.

in reply to:  4 comment:5 by Adya, 22 hours ago

Replying to Simon Charette:

Ok, it makes sense.

Could you suggest me any other available ticket, and I can work on it? It will be my pleasure

in reply to:  2 comment:6 by Take Weiland, 18 hours ago

Replying to Simon Charette:

Thank you for your pointers. I have already started working on this here: https://github.com/django/django/compare/main...diesieben07:django:prefetch_related_inheritance

I will work on polishing it to get it ready for a contribution. One thing to think about is indeed cache invalidation. If you do parent.child.m2m_relation.set(...) should that invalidate parent.m2m_relation? Currently it doesn't, but if parent.child.m2m_relation and parent.m2m_relation now point to the same cache, because the child one just looks in its parent...?
Probably the clearing also needs to happen recursively and walk up to the parents, but that is a potentially more invasive change.

comment:7 by Simon Charette, 16 hours ago

Owner: set to Take Weiland
Status: newassigned

Good point about cache invalidation.

All layers of inherited caching should effectively be busted when parent.child.rel is altered.

Note: See TracTickets for help on using tickets.
Back to Top