Context Navigation

← Previous Ticket
Next Ticket →

#36489 assigned Bug

OneToOneField + concurrent get_or_create results in wrong object in field cache

Reported by:	Brian Atkinson	Owned by:	Sina Chaichi
Component:	Database layer (models, ORM)	Version:	5.2
Severity:	Normal	Keywords:
Cc:	Simon Charette, Mariusz Felisiak	Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description (last modified by Brian Atkinson)

When using get_or_create and update_or_create in a context where concurrent creations are happening, the caller that loses the creation race will end up with an incomplete object in the field cache.

Example models:

from django.db import models

class Parent(models.Model):
    name = models.TextField()

class Child(models.Model):
    parent = models.OneToOneField(Parent, on_delete=models.CASCADE)

In effect, we have two separate processes both expecting to be able to call Child.objects.get_or_create(parent=common_parent, defaults={...}) and have the resulting common_parent.child be the correct result. We are seeing both processes fail the initial get call (https://github.com/django/django/blob/5.2.3/django/db/models/query.py#L946) and proceeding to attempt the create call (https://github.com/django/django/blob/5.2.3/django/db/models/query.py#L953). One of the processes succeeds on the insert, and the other will fail the insert. The process that fails the insert correctly issues the second get call (https://github.com/django/django/blob/5.2.3/django/db/models/query.py#L956) and returns the result.

The bug is that for the process that fails the insert, the object at common_parent.child is not the value returned from the get_or_create. It seems to be the object that was created for insert, but wasn't cleared when the insert failed. This can be observed through the common_parent.child.id field being None. This has caused data corruption and other downstream bugs due to the wrong data being cached.

Here is a test case that is a linear representation of the concurrent bug described above:

from unittest import mock

from django.db.models import QuerySet
from django.test import TestCase

from sample.models import Child, Parent

class TestBug(TestCase):
    def test_bug(self):
        parent = Parent.objects.create(name="Test name")

        triggered = False

        def effect(*args, **kwargs):
            nonlocal triggered
            if not triggered and kwargs.get("parent") == parent:
                triggered = True
                Child.objects.update_or_create(
                    # Force a brand new instance (emulates racing thread).
                    parent=Parent.objects.get(pk=parent.pk),
                    defaults={},
                )
            return mock.DEFAULT

        with mock.patch.object(
            QuerySet,
            "_extract_model_params",
            autospec=True,
            wraps=QuerySet._extract_model_params,
            side_effect=effect,
        ):
            child, created = Child.objects.get_or_create(parent=parent, defaults={})
            assert not created

        assert child.parent_id is not None
        assert parent.child.id is not None

At first glance this test case will seem quite janky. The patching is being done to emulate the effect of two racing threads by injecting a concurrent call to get_or_create in the middle of the outer get_or_create.

Change History (22)

comment:1 by Brian Atkinson, 7 months ago

Description:	modified (diff)

comment:2 by Natalia Bidart, 6 months ago

Cc:	Simon Charette Mariusz Felisiak added
Component:	Uncategorized → Database layer (models, ORM)
Triage Stage:	Unreviewed → Accepted
Type:	Uncategorized → Bug

Thank you Brian Atkinson for taking the time to create this ticket. I have reviewed the test case that you provided, which is quite helpful to understand your report.

I was able to reproduce the issue as described. When a concurrent get_or_create() call results in an IntegrityError, the model instance created for the first attempt is incorrectly cached on the reverse side of the one-to-one relation. This leads to inconsistent behavior, such as accessing parent.child and getting an unsaved Child instance with pk=None.

A few notes: I believe the issue is somewhere between the related descriptors and how the reverse 1-1 is set but I haven't been able to exactly pin point the precise place that needs fixing. Also I don't think the update_or_create call is necessary, we can stick to two get_or_create I think? or is there any value/difference to use one of each?

comment:3 by Brian Atkinson, 6 months ago

Also I don't think the update_or_create call is necessary, we can stick to two get_or_create I think?

You are correct, I missed cleaning that up from an earlier iteration of building the test case. We'd initially seen it with two racing update_or_create, but upon investigation it is possible with just the underlying get_or_create. I'm happy to update the example in the initial report with that cleanup if that would be helpful.

Thank you for taking the time to look into this.

comment:4 by Simon Charette, 6 months ago

Thank you for your very detailed report. I understand this is a tricky issue to diagnose, particularly because it should be rare under normal circumstances, but it's hard to determine what should be the right thing to do in this case as the ORM doesn't maintain an identity map (e.g. like SQLAlchemy session for example) so it has to balance a delicate sequence of cache clearing.

A naive approach could be the following but likely breaks usage of select_related("parent").get_or_create(parent=parent) by forcing an extra query as it systematically clears the cache

django/db/models/query.py

diff --git a/django/db/models/query.py b/django/db/models/query.py
index 63ab4a873a..b03f318076 100644

-              a
     IntegrityError,
     NotSupportedError,
     connections,
+    models,
     router,
     transaction,
+)
-…
+               def get_or_create(self, defaults=None, **kwargs):
                     return self.create(**params), True
             except IntegrityError:
                 try:
                     return self.get(**kwargs), False
+                    obj = self.get(**kwargs)
                 except self.model.DoesNotExist:
                     pass
+                else:
+                    get_field = self.model._meta.get_field
+                    for field_name, value in kwargs.items():
+                        if not isinstance(value, models.Model):
+                            continue
+                        try:
+                            field = get_field(field_name)
+                        except exceptions.FieldDoesNotExist:
+                            continue
+                        if getattr(field, "one_to_one", False):
+                            field.remote_field.delete_cached_value(value)
+                    return obj, False
                 raise
     get_or_create.alters_data = True

while the ORM doesn't have a session wide concept of an identity map though it has one per queryset stored in the QuerySet.known_related_objects: dict[Field, dict] attribute so it might be usable for this purpose? It seems like it would be coherent with how many related managers work by maintaining affinity (e.g. author.books.get_or_create)

django/db/models/query.py

diff --git a/django/db/models/query.py b/django/db/models/query.py
index 63ab4a873a..b03f318076 100644

-              a
     IntegrityError,
     NotSupportedError,
     connections,
+    models,
     router,
     transaction,
+)
-…
+               def get_or_create(self, defaults=None, **kwargs):
         # The get() needs to be targeted at the write database in order
         # to avoid potential transaction consistency problems.
         self._for_write = True
+        get_field = self.model._meta.get_field
+        known_related_objects = {}
+        for field_name, value in kwargs.items():
+            if not isinstance(value, models.Model):
+                continue
+            try:
+                field = get_field(field_name)
+            except exceptions.FieldDoesNotExist:
+                continue
+            if field.remote_field:
+                known_related_objects.setdefault(field, {})[value.pk] = value
+        qs = self
+        if known_related_objects:
+            qs = qs._clone()
+            for field, objects in known_related_objects.items():
+                qs._known_related_objects.setdefault(field, {}).update(objects)
         try:
             return self.get(**kwargs), False
+            return qs.get(**kwargs), False
         except self.model.DoesNotExist:
             params = self._extract_model_params(defaults, **kwargs)
             # Try to create an object using passed params.
             try:
                 with transaction.atomic(using=self.db):
                     params = dict(resolve_callables(params))
                     return self.create(**params), True
+                    return qs.create(**params), True
             except IntegrityError:
                 try:
                     return self.get(**kwargs), False
+                    return qs.get(**kwargs), False
                 except self.model.DoesNotExist:
                     pass
                 raise

it does bring a few questions though mainly should update_or_create be updated as well and to what extent? Should defaults and create_defaults also be considered? Only when creation actually takes place or systematically?

comment:5 by Brian Atkinson, 6 months ago

The part that bit us is that the object left behind from the failed creation could have different field values from the actual row read from the database. The example above had much of that stripped, but imagine one of the fields provided during creation is context sensitive (example: UUID of the request), then the parent.child could have data that differs from the returned row.

More concretely, consider adding the following to the Child table:

context_id = models.UUIDField(default=uuid.uuid4)

I would expect the following to hold:

assert child.context_id == parent.child.context_id

From my perspective as a user, I might prefer an additional read to populate that relation over that inequality not holding. At least I think that would be my preference. The comments about an "identity map" make me confident that there are a lot more details around how the APIs should be behaving that I'm not unaware of. Is there an assumption that the following should be true?

assert child is parent.child

Thanks for taking the time to consider this issue.

comment:6 by Simon Charette, 6 months ago

From my perspective as a user, I might prefer an additional read to populate that relation over that inequality not holding. At least I think that would be my preference.

Proper cache invalidation would effectively be the less risky option but it's far from trivial do to correctly. We can't systematically clear the cache as it might discard explicit relationship retrieval through select_related as alluded above.

Is there an assumption that the following should be true?
assert child is parent.child

No, in particular when created is False like it's the case here.

Think of it this way, get_or_create is meant to be a concurrency safe hybrid between get and create so users don't have to re-implement it in an unsafe way every time the upsert pattern is needed.

In the case of get the child is parent.child assumption will never hold true, that is

>>> child = Child.objects.get(parent=parent)
>>> child is parent.child
False

as there is no session identity mapper. On the hand, because of implementation details, the assumption will hold true for create

>>> child = Child.objects.create(parent=parent)
>>> child is parent.child
True

which brings the question of what exactly should be done in cases where get_or_create falls back to get (created is False) given it's meant to be an hybrid. Is it better to keep things as they are today and avoid introducing an exception when a to-one relationship is provided and the object already exists or is it better to align it to-many related managers which offer a way to tap-in this per-queryset identity mapper logic that to-one relationship cannot take advantage of because attribute access doesn't yield a manager

>>> author.books.create(title="2001: A Space Odyssey")
>>> book, created = author.books.get_or_create(title="2001: A Space Odyssey")
>>> created  # get() was used.
False
>>> book.author is author
True

I'll have to ponder on that one a bit because while I can see the appeal it introduces significant complexity with high risk of subtle breakages. If we are fixing this I'm of opinion that we should make it work for get_or_create under all circumstances where get is used and not only on the IntegrityError fallback as that would be even more confusing than it is today.

comment:7 by Jason Hall, 6 months ago

Owner:	set to Jason Hall
Status:	new → assigned

comment:8 by Jason Hall, 6 months ago

Has patch:	set

comment:9 by Sarah Boyce, 6 months ago

Patch needs improvement:	set

comment:10 by Jason Hall, 5 months ago

Triage Stage:	Accepted → Ready for checkin

comment:11 by Simon Charette, 5 months ago

Triage Stage:	Ready for checkin → Accepted

Please refer to the contributing documentation, you can't mark your own patch RFC.

comment:12 by Jason Hall, 5 months ago

Patch needs improvement:	unset

comment:13 by Jason Hall, 5 months ago

I understand. Thank you.

comment:14 by Rushabh Doshi, 5 months ago

Natalia, Simon and Sarah - thank you for looking at this issue and the patch.
Jason - thanks for making the patch in the first place.

Given that the patch doesn't seem to fully solve the problem, and given that this issue affects all OneToOneFields and causes a subtle data corruption that could be hard to detect and diagnose, I have two questions:

Is there a way to raise awareness of this issue and seek help in fixing? We would offer to give it a shot, but this is very deep ORM code and we don't feel like we have sufficient experience to attempt the patch.
In the meantime, could you advise on the work around? My current thought is that we should use ForeignKey(unique=True) and eat the Django check warning https://github.com/django/django/blob/main/django/db/models/fields/related.py#L1079 This would force us to use child_set.first() but that's a small ergonomic price to pay. Do you have any thoughts on this approach or a better suggestion instead?

comment:15 by Jason Hall, 4 months ago

Owner:	Jason Hall removed
Status:	assigned → new

comment:16 by Sarah Boyce, 4 months ago

Has patch:	unset

comment:17 by Sarah Boyce, 4 months ago

Has patch:	set
Patch needs improvement:	set

comment:18 by Sina Chaichi, 4 months ago

Hello everyone, I looked at the history of the ticket and saw the PR is closed. I was wondering if it's ok for me to work on that?

comment:19 by Sina Chaichi, 4 months ago

I opened new PR for this issue:

https://github.com/django/django/pull/19890

This patch adds conditional reverse OneToOne cache clearing to fix stale
instances left during concurrent get_or_create()/update_or_create() races.
It also includes regression tests (get_or_create, update_or_create, with
defaults) and release notes.

comment:20 by Sina Chaichi, 4 months ago

Patch needs improvement:	unset

comment:21 by Sarah Boyce, 4 months ago

Owner:	set to Sina Chaichi
Status:	new → assigned

comment:22 by Sina Chaichi, 3 months ago

Hello everyone, can someone please take a look at this PR? I've rebased once

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#36489 assigned Bug

OneToOneField + concurrent get_or_create results in wrong object in field cache

Description (last modified by Brian Atkinson)

Change History (22)

comment:1 by Brian Atkinson, 7 months ago

comment:2 by Natalia Bidart, 6 months ago

comment:3 by Brian Atkinson, 6 months ago

comment:4 by Simon Charette, 6 months ago

django/db/models/query.py

django/db/models/query.py

comment:5 by Brian Atkinson, 6 months ago

comment:6 by Simon Charette, 6 months ago

comment:7 by Jason Hall, 6 months ago

comment:8 by Jason Hall, 6 months ago

comment:9 by Sarah Boyce, 6 months ago

comment:10 by Jason Hall, 5 months ago

comment:11 by Simon Charette, 5 months ago

comment:12 by Jason Hall, 5 months ago

comment:13 by Jason Hall, 5 months ago

comment:14 by Rushabh Doshi, 5 months ago

comment:15 by Jason Hall, 4 months ago

comment:16 by Sarah Boyce, 4 months ago

comment:17 by Sarah Boyce, 4 months ago

comment:18 by Sina Chaichi, 4 months ago

comment:19 by Sina Chaichi, 4 months ago

comment:20 by Sina Chaichi, 4 months ago

comment:21 by Sarah Boyce, 4 months ago

comment:22 by Sina Chaichi, 3 months ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us