Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#26804 closed Bug (fixed)

update_or_create() updates all columns, which could cause a race condition

Reported by: Jensen Cochran Owned by: Jensen Cochran
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: update_or_create
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The update_or_create() helper function in Django is passed a 'defaults' kwarg, which contains the fields that we expect the function to update. When .save() is called in the function to do the update, it does not pass the update_fields parameter to .save(), and so the SQL generated to complete the update sets every column (except pk), even those not passed in 'defaults'.

This creates a potential race condition. In the period of time between when the function calls .get() to retrieve the object and when it calls .save(), some other process may have updated a column in that row that has nothing to do with this update. When save() is called, these changes will be overwritten. This is not an issue for the fields passed into 'defaults' since we expected those to be updated anyway. The main issue is that update_or_create updates fields/columns other than those passed in 'defaults', and does so in a way that could cause data loss.

Example:

class Authors(models.Model):
    id = models.AutoField(primary_key=True)
    name = models.CharField(max_length=128, unique=True)
    newest_release = models.DateTimeField()
    active = models.BooleanField(default=True)


Authors.objects.update_or_create(
    name='Bob',
    defaults={
        'newest_release': timezone.now()
    })


# Django update_or_create code from https://github.com/django/django/blob/master/django/db/models/query.py

    def update_or_create(self, defaults=None, **kwargs):
        """
        Looks up an object with the given kwargs, updating one with defaults
        if it exists, otherwise creates a new one.
        Returns a tuple (object, created), where created is a boolean
        specifying whether an object was created.
        """
        defaults = defaults or {}
        lookup, params = self._extract_model_params(defaults, **kwargs)
        self._for_write = True
        try:
            obj = self.get(**lookup)
        except self.model.DoesNotExist:
            obj, created = self._create_object_from_params(lookup, params)
            if created:
                return obj, created
        for k, v in six.iteritems(defaults):
            setattr(obj, k, v)
        obj.save(using=self.db)
        return obj, False

Assume that the author Bob already exists in the database.
update_or_create retrieves the Author object using .get().

Meanwhile, another process runs the following SQL:
UPDATE authors SET active = FALSE WHERE id = 1234;

Then, update_or_create sets the values from 'defaults' on the object, then calls .save() on the object. Generated SQL would look something like
UPDATE authors SET name='Bob', active=TRUE, newest_release='#######'::timestamp, WHERE id = 1234"

Notice that both 'name' and 'active' are set in the SQL, even though they were not passed in defaults. Bob has been wrongly made active again, even though the update portion of the update_or_create() call had nothing to do with the 'active' field. Besides the data issues, it is also not efficient to have the database update every field/column when there is no need.

I have tested this behavior by putting a time.sleep() into update_or_create right after the .get() and updating the database during this sleep.
Once the sleep finished, my changes were overwritten by .save()

My initial thought to patch this was to change
obj.save(using=self.db)
in update_or_create into
obj.save(using=self.db, update_fields=defaults.keys())

However, this causes a single test to fail in generic_relations, since update_fields in .save() does not accept GenericForeignKey fields, which may be bug in and of itself. There are other ways to fix this like locking the row, but that would be less efficient and not fix the underlying issue.

ERROR: test_update_or_create_defaults (generic_relations.tests.GenericRelationsTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python2.7/unittest/case.py", line 365, in run
    testMethod()
  File "/home/jensen/sandbox/django/tests/generic_relations/tests.py", line 540, in test_update_or_create_defaults
    tag, created = TaggedItem.objects.update_or_create(tag="shiny", defaults={'content_object': diamond})
  File "/home/jensen/sandbox/django/django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/jensen/sandbox/django/django/db/models/query.py", line 493, in update_or_create
    obj.save(using=self.db, update_fields=defaults.keys())
  File "/home/jensen/sandbox/django/django/db/models/base.py", line 782, in save
    % ', '.join(non_model_fields))
ValueError: The following fields do not exist in this model or are m2m fields: content_object

Change History (12)

comment:1 by Jensen Cochran, 8 years ago

Summary: update_or_create() updates all columnsupdate_or_create() updates all columns, which could cause a race condition

comment:2 by Shai Berger, 8 years ago

Triage Stage: UnreviewedAccepted

No, actually locking the row is the only valid solution. Otherwise you still have a race condition -- when someone else updates the record, it is possible that your update_or_create should be creating a new row rather than updating the existing one. IMO the right fix is to change the row

             obj = self.get(**lookup)

into

             obj = self.select_for_update().get(**lookup)

comment:3 by Jensen Cochran, 8 years ago

Has patch: set

comment:4 by Tim Graham, 8 years ago

Needs tests: set

comment:5 by Jensen Cochran, 8 years ago

Needs tests: unset

comment:6 by Jensen Cochran, 8 years ago

Owner: changed from nobody to Jensen Cochran
Status: newassigned

comment:7 by Tim Graham, 8 years ago

Triage Stage: AcceptedReady for checkin

comment:8 by Tim Graham, 8 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Awaiting resolution of #26884.

comment:9 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In d44afd88:

Fixed #26804 -- Fixed a race condition in QuerySet.update_or_create().

comment:10 by Cristiano Coelho, 8 years ago

Would it make sense to avoid the same race condition on get_or_create ? Although an added select_for_update on a single get query which might be the use case on most of the cases (99% selects, 1% actually create), would reduce the performance of those gets by quite a lot since a select_for_update adds a lock/write to the row. So should probably be documented or come with an optional flag.

in reply to:  10 comment:11 by Jensen Cochran, 8 years ago

The race condition in get_or_create would be: get_or_create does not find a matching row, then another process inserts a row that get_or_create would have matched, then get_or_create inserts it's row. Now, two rows in the table exist with duplicate values (for the columns that were being searched against).

If you have a unique index in your database for the columns you passed to get_or_create and this race condition occurred, you would not have duplicate values inserted. Instead, you would get a db exception. So it is possible to avoid the overhead of the transaction in some cases, but only if we know that the columns are unique/unique_together.

comment:12 by Tim Graham <timograham@…>, 8 years ago

In 83be4076:

Fixed #26933 -- Fixed flaky update_or_create() test from refs #26804.

Note: See TracTickets for help on using tickets.
Back to Top