Opened 2 years ago

Closed 2 years ago

Last modified 23 months ago

#33984 closed Bug (fixed)

Related managers cache gets stale after saving a fetched model with new PK

Reported by: joeli Owned by: Mariusz Felisiak
Component: Database layer (models, ORM) Version: 4.1
Severity: Release blocker Keywords:
Cc: Keryn Knight Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by joeli)

I'm upgrading our codebase from Django 3.2 to 4.1, and came upon a regression when running our test suite. I bisected it down to commit 4f8c7fd9d9 Fixed #32980 -- Made models cache related managers: https://github.com/django/django/commit/4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6

The main problem is that when you have fetched a model containing a m2m field from the database, and access its m2m field the manager gets cached. If you then set the model's .pk to None and do a .save() to save it as a copy of the old one, the related manager cache is not cleared. Here's some code with inline comments to demonstrate:

# models.py
from django.db import models

class Tag(models.Model):
    tag = models.SlugField(max_length=64, unique=True)

    def __str__(self):
        return self.tag

class Thing(models.Model):
    tags = models.ManyToManyField(Tag, blank=True)

    def clone(self) -> 'Thing':
        # To clone a thing, we first save a list of the tags it has
        tags = list(self.tags.all())

        # Then set its pk to none and save, creating the copy
        self.pk = None
        self.save()

        # In django 3.2 the following sets the original tags for the new instance.
        # In 4.x it's a no-op because self.tags returns the old instance's manager.
        self.tags.set(tags)
        return self

    @staticmethod
    def has_bug():
        one, _ = Tag.objects.get_or_create(tag='one')
        two, _ = Tag.objects.get_or_create(tag='two')
        
        obj = Thing.objects.create()
        obj.tags.set([one, two])
        new_thing = obj.clone()

        # new_thing.tags.all() returns the expected tags, but it is actually returning obj.tags.all()
        # If we fetch new_thing again it returns the actual tags for new_thing, which is empty.
        #
        # Even `new_thing.refresh_from_db()` -- which is not required with 3.x -- does not help.
        # `new_thing._state.related_managers_cache.clear()` works, but feels like something I
        # shouldn't have to do.
        return list(new_thing.tags.all()) != list(Thing.objects.get(pk=new_thing.pk).tags.all())

Change History (22)

comment:1 by joeli, 2 years ago

Description: modified (diff)

comment:2 by Mariusz Felisiak, 2 years ago

Cc: Keryn Knight added
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

This is a documented breaking change, however it also broke a documented way for copying model instances, so we need to update docs or fix it.

comment:3 by Bhuvnesh, 2 years ago

I am a bit confuse here, Using clone method all the tags are being copied from the old instance still the actual tags for the new instance ( new_thing ) are empty after fetching the new instance again? why the tags of old instance are not cloned ?

in reply to:  3 comment:4 by joeli, 2 years ago

Replying to Bhuvnesh:

I am a bit confuse here, Using clone method all the tags are being copied from the old instance still the actual tags for the new instance ( new_thing ) are empty after fetching the new instance again? why the tags of old instance are not cloned ?

That is the bug here: in .clone() after doing self.pk = None and self.save() to create a new copy, self.tags still points to the old thing's related manager. So .clone() ends up setting the tags for the old instance (that already has said tags). Then it returns a thing that pretends to be the new_thing but actually returns the tags of the old thing unless you completely refetch it from database.

in reply to:  2 comment:5 by joeli, 2 years ago

Replying to Mariusz Felisiak:

This is a documented breaking change, however it also broke a documented way for copying model instances, so we need to update docs or fix it.

Thanks, I was looking for that exact link to include in my original report but couldn't find where it was mentioned. Maybe worth noting here that adding self._state.adding = True before the save in .clone() does not alter the result of this bug report.

FWIW I'd really like to see this fixed rather than addressed in the documentation, as we are relying on this quite heavily in production code to clone objects with deeply nested relations, so changing it to another style would be a pain. Not only that but I don't think there's any other documented way to duplicate objects in the ORM?

comment:6 by Keryn Knight, 2 years ago

In theory the "simplest" solution (to keep the caching behaviour and allow the documented duplication method) is to special-case None as a pk value and always instantiate a fresh manager, circumventing the cache entirely.
I think though that the problem technically extends further, any change to the pk -- or more specifically, the manager's core filter dependencies -- would become stale subsequently. Unsupported those other cases may be from a documentation standpoint, but I'm sure someone somewhere is doing it...

If there's a path where we (I) can invalidate correctly, and it keeps the overall improvement of #32980, and it's not too invasive or complex, we can try and proceed on that basis. If the change restores the old performance behaviour in the process, there's not much argument (except maybe memory pressure/garbage collection) to keeping the cache functionality, off the top of my head.

Basically there's two things to check there above, and if either of them is a miss, I guess we'd revert #32980 because I can envisage plenty of cases where people use the pk=None trick to clone an instance, documented as it is.

I may assign this out to myself in the near future if it's otherwise not picked up, assuming I can find the availability.

comment:7 by Mariusz Felisiak, 2 years ago

In theory the "simplest" solution (to keep the caching behaviour and allow the documented duplication method) is to special-case None as a pk value and always instantiate a fresh manager, circumventing the cache entirely.

None is already handled separately for related objects, so maybe it's enough to refresh related_managers_cache when self.pk is None and they're cached 🤔 I'm also happy to revert 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6 and re-open #32980.

comment:8 by Bhuvnesh, 2 years ago

Can anyone write a regression test for this one?

in reply to:  8 comment:9 by Mariusz Felisiak, 2 years ago

Replying to Bhuvnesh:

Can anyone write a regression test for this one?

A regression test is already in docs:

entry = Entry.objects.all()[0] # some previous entry
old_authors = entry.authors.all()
entry.pk = None
entry._state.adding = True
entry.save()
entry.authors.set(old_authors)
self.assertCountEqual(entry.authors.all(), [...])

Have you check previous comments?

comment:10 by Bhuvnesh, 2 years ago

Yes , i read the previous comments! Sorry, i didn't realize that i could make out regression tests from documentation too :)

comment:11 by Bhuvnesh, 2 years ago

Has patch: set

PR
Added a condition to refresh related_managers_cache when self.pk is None and self._state.adding is True. If self.pk = None will be missing then the instance would not be cloned and if self._state.adding = True is missing then related_manangers_cache would not be refreshed generating inconsistent results, hence both the conditions are necessary. It is also mentioned in docs.

comment:12 by Bhuvnesh, 2 years ago

Owner: changed from nobody to Bhuvnesh
Status: newassigned

comment:13 by Mariusz Felisiak, 2 years ago

Patch needs improvement: set

comment:14 by Mariusz Felisiak, 2 years ago

I found another regression in caching related manager. The following test works on Django 4.0 and crashes with 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6:

  • tests/many_to_many/tests.py

    diff --git a/tests/many_to_many/tests.py b/tests/many_to_many/tests.py
    index 53e870ddad..770808a92c 100644
    a b class ManyToManyTests(TestCase):  
    9292        a5.authors.remove(user_2.username)
    9393        self.assertSequenceEqual(a5.authors.all(), [])
    9494
     95    def test_related_manager_refresh(self):
     96        user_1 = User.objects.create(username="Jean")
     97        user_2 = User.objects.create(username="Joe")
     98        a5 = Article.objects.create(headline="Django lets you create web apps easily")
     99        a5.authors.add(user_1.username)
     100        self.assertSequenceEqual(user_1.article_set.all(), [a5])
     101        # Change the username on a different instance of the same user.
     102        user_1_from_db = User.objects.get(pk=user_1.pk)
     103        self.assertSequenceEqual(user_1_from_db.article_set.all(), [a5])
     104        user_1_from_db.username = "Paul"
     105        user_1_from_db.save()
     106        user_2.username = "Jean"
     107        user_2.save()
     108        # Add a different article.
     109        self.a4.authors.add(user_1_from_db.username)
     110        self.assertSequenceEqual(user_1_from_db.article_set.all(), [self.a4])
     111        # Refresh the first instance from DB.
     112        user_1.refresh_from_db()
     113        self.assertEqual(user_1.username, "Paul")
     114        self.assertSequenceEqual(user_1.article_set.all(), [self.a4])
     115
    95116    def test_add_remove_invalid_type(self):
    96117        msg = "Field 'id' expected a number but got 'invalid'."
    97118        for method in ["add", "remove"]:

It shows two issues:

  • the second call of user_1_from_db.article_set.all() should use a new username (Paul) but it's still using the old one (Jean),
  • user_1.refresh_from_db() doesn't clear cached managers.

I'm going to prepare revert of 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6 and reopen #32980.

comment:15 by Mariusz Felisiak, 2 years ago

Owner: changed from Bhuvnesh to Mariusz Felisiak
Patch needs improvement: unset

comment:16 by GitHub <noreply@…>, 2 years ago

Resolution: fixed
Status: assignedclosed

In 5e0aa362:

Fixed #33984 -- Reverted "Fixed #32980 -- Made models cache related managers."

This reverts 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6 and adds
two regression tests:

  • test_related_manager_refresh(), and
  • test_create_copy_with_m2m().

Thanks joeli for the report.

comment:17 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 7a167580:

[4.1.x] Fixed #33984 -- Reverted "Fixed #32980 -- Made models cache related managers."

This reverts 4f8c7fd9d91b35e2c2922de4bb50c8c8066cbbc6 and adds
two regression tests:

  • test_related_manager_refresh(), and
  • test_create_copy_with_m2m().

Thanks joeli for the report.
Backport of 5e0aa362d91d000984995ce374c2d7547d8d107f from main

comment:18 by Steven Mapes, 2 years ago

Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the init even if you check for the the instance being set along with the relationship?

I.E If you have models and a Model form like this

# models.py
from django.db import models


class Tag(models.Model):
    tag = models.SlugField(max_length=64, unique=True)


class Thing(models.Model):
    tag = models.ForeignKey(Tag, on_delete=models.CASCADE, related_name="things")
    active = models.BooleanField(default=True)

# forms.py
from django import forms
from example.models import Tag


class TagForm(forms.ModelForm):
    class Meta:
        model = Tag
        fields = ["tag"]

    def __init__(self, *args, **kwargs):
        super(TagForm, self).__init__(*args, **kwargs)

        if self.instance and self.instance.things:
            inactive_things = self.instance.things.filter(active=False)
            # Do something with it

Then have the following test

from unittest.case import TestCase
from example.forms import TagForm


class TagFormCase(TestCase):
    def test_required(self):
        """Test required fields"""
        form = TagForm({})
        self.assertFalse(form.is_valid())
        self.assertEqual(
            {
                "tag": ["This field is required."],
            },
            form.errors,
        )


The test would pass in all Django versions up to 4.1. From 4.1.0 onwards it raises a ValueError of ValueError: 'Tag' instance needs to have a primary key value before this relationship can be used.

In order for the test to pass you need to change the flow control to if self.instance.pk and self.instance.things. You can't use self.instance.things.count() as it raises the exception.

However if you overload the primary key on the model to use a UUID such as:

# models.py
import uuid
from django.db import models


class Tag(models.Model):
    id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    tag = models.SlugField(max_length=64, unique=True)


class Thing(models.Model):
    tag = models.ForeignKey(Tag, on_delete=models.CASCADE, related_name="things")
    active = models.BooleanField(default=True)

then the original test will pass as the uuid will be set prior to saving

in reply to:  18 ; comment:19 by Keryn Knight, 2 years ago

Replying to Steven Mapes:

Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the __init__ even if you check for the the instance being set along with the relationship?

I believe that's #33952 -- that was fixed in 4.1.1 and this ticket in 4.1.2.

in reply to:  19 comment:20 by Steven Mapes, 2 years ago

Replying to Keryn Knight:

Replying to Steven Mapes:

Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the __init__ even if you check for the the instance being set along with the relationship?

I believe that's #33952 -- that was fixed in 4.1.1 and this ticket in 4.1.2.

Ah okay, I just tested it with 3.2.16, 4.0.8, 4.1.0, 4.1.1, 4.1.2 and the tests pass in 3.2.16 and 4.0.8 but still fail in the others

in reply to:  19 comment:21 by Alexandr Bezenkov, 2 years ago

Replying to Keryn Knight:

Replying to Steven Mapes:

Is this why model forms now, since Django 4.1.0 raise ValueErrors if you try access Related Managers within the __init__ even if you check for the the instance being set along with the relationship?

I believe that's #33952 -- that was fixed in 4.1.1 and this ticket in 4.1.2.

Confirming, exists in 4.1.2.

comment:22 by GitHub <noreply@…>, 23 months ago

In 57c2e5d:

Refs #33984 -- Added test for creating copies of model instances with inherited m2m fields.

Note: See TracTickets for help on using tickets.
Back to Top