Opened 5 years ago

Closed 22 months ago

Last modified 22 months ago

#12579 closed Bug (fixed)

QuerySet.get_or_create()'s behavior depends on database's uniqueness restrictions

Reported by: timmolendijk Owned by: Tim Graham <timograham@…>
Component: Documentation Version: 1.0
Severity: Normal Keywords: get_or_create concurrency race unique documentation
Cc: tomasz.zielinski@…, vijay_shanker Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

QuerySet.get_or_create() works as expected, IFF the kwargs that are provided correspond to database columns with appropriate uniqueness restrictions. In other words: get_or_create() depends on an IntegrityError being raised under certain circumstances. If you have a database that does not do this, f.e. because a table's uniqueness logic cannot be expressed with the database's UNIQUE construct.

For instance:

Given the following table: CREATE TABLE t1 (i INT, flag TINYINT, parent_id INT, PRIMARY KEY (i)) ENGINE = InnoDB;
An additional restriction, one which cannot be expressed in the table definition, is that only one record can exist with flag=1 for every given parent_id.
The following code gives the impression that we enforce said restriction: T1.objects.get_or_create(parent_id=3, flag=1).
But this is a false impression, as two concurrent transactions executing this same code can (and will on a regular basis) result in two records with parent_id=3 and flag=1. This is the result of both transactions trying to get() a record with parent_id=3 and flag=1 before any of them has inserted this record. So both transactions will conclude they need to create{} the record, which will be allowed by the database and thus an inconsistency will enter the database.

Now the ideal scenario would be that race conditions are prevented regardless of how the underlying database table is defined. But I am not sure if this is a feasible approach. At the very least I think people who use get_or_create() need to be aware that it can give rise to concurrency issues, unless they follow some basic guidelines... something like that.

Attachments (1)

12579.diff (970 bytes) - added by vijay_shanker 2 years ago.
added a paragraph referring get_or_create concurrency issue

Download all attachments as: .zip

Change History (25)

comment:1 Changed 5 years ago by kmtracey

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I don't agree that the ideal situation is that Django code strive to prevent race conditions regardless of how the underlying database tables are defined. Rather I believe the ideal situation is that tables be defined such that the database is where race conditions get dealt with. Databases already have to deal with these issues, I don't believe attempting to duplicate their effort correctly in Django is a good idea.

In this specific case it seems possible to get quite close to the kind of database constraint you are looking for by declaring flag to be a NullBooleanField that is unique_together with parent_id. The database will then disallow multiple 1 (True) values for any given parent_id. It will also disallow multiple 0 (False) values. If that is an issue -- that is, if you need to be able to allow multiple "non 1" values for a given parent_id, then you can use NULL instead of 0 to represent the "not 1" case, since NULL != NULL you will be able to have multiple NULL entries for a single parent_id.

I think I agree, however, that the doc for get_or_create could be improved. The original implementation of this method did not attempt to support concurrent access. The doc (http://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create-kwargs) states its intended use case is things like data-import scripts and rather discourages its use in views. When #6641 was fixed the method became safe and useful to use for atomic creation of objects (assuming proper model constraints are specified) in a view. Unless I'm missing something, I don't see why the doc shouldn't make it clear that the routine is safe to use for this function.

comment:2 Changed 5 years ago by russellm

  • Component changed from Database layer (models, ORM) to Documentation
  • Triage Stage changed from Unreviewed to Accepted

I agree with Karen - this is an area for documentation improvement describing the circumstances where the method can be used safely, not an opportunity to reproduce database logic in Django's ORM

comment:3 Changed 4 years ago by mattmcc

  • Severity set to Normal
  • Type set to Bug

comment:4 Changed 4 years ago by TomaszZielinski

  • Cc tomasz.zielinski@… added
  • Easy pickings unset

JFYI get_or_create is broken also in one other way: I think the problem is still there: http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django/2235624#2235624

comment:5 Changed 4 years ago by TomaszZielinski

  • Easy pickings set

(Sorry for accidential unsetting the easy pickings)

comment:6 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:7 Changed 3 years ago by jdunck

We (Votizen) were seeing the REPEATABLE READ visibility problem (described in the StackOverflow link above) on mysql+innodb+REPEATABLE READ. To me, the suggestion on StackOverflow- of committing whatever transaction is pending - is bogus. Transactions need to be composable, and committing so you can read means you can't combine this function with any other function that requires an ongoing transaction.

We switched (about 6 months ago) to READ COMMITTED. This reduces throughput in mysql, but is a good tradeoff to me.

I think that since MySQL's innodb default transaction isolation is incompatible with Django's get_or_create approach, it would be good to either note in the docs for get_or_create, or advise setting READ COMMITTED for all users of mysql.

comment:8 Changed 2 years ago by vijay_shanker

  • Owner changed from nobody to vijay_shanker
  • Status changed from new to assigned

Changed 2 years ago by vijay_shanker

added a paragraph referring get_or_create concurrency issue

comment:9 Changed 2 years ago by vijay_shanker

  • Resolution set to fixed
  • Status changed from assigned to closed
  • Triage Stage changed from Accepted to Ready for checkin

comment:10 Changed 2 years ago by vijay_shanker

  • Cc vijay_shanker added

comment:11 Changed 2 years ago by vijay_shanker

  • Triage Stage changed from Ready for checkin to Unreviewed
Version 0, edited 2 years ago by vijay_shanker (next)

comment:12 Changed 2 years ago by vijay_shanker

  • Triage Stage changed from Unreviewed to Ready for checkin

comment:13 Changed 2 years ago by timo

  • Has patch set
  • Resolution fixed deleted
  • Status changed from closed to new
  • Triage Stage changed from Ready for checkin to Accepted

Please don't mark your own patch as Ready for checking as it needs to be reviewed by someone else. see https://docs.djangoproject.com/en/dev/internals/contributing/triaging-tickets/

comment:14 Changed 2 years ago by vijay_shanker

  • Resolution set to fixed
  • Status changed from new to closed

comment:15 follow-up: Changed 2 years ago by timo

  • Resolution fixed deleted
  • Status changed from closed to new

A ticket is not marked fixed until it has been committed.

comment:16 in reply to: ↑ 15 Changed 2 years ago by vijay_shanker

Replying to timo:

A ticket is not marked fixed until it has been committed.

ok. what should i do now ? i stll own the ticket and now it is marked as new ?

comment:17 Changed 2 years ago by anonymous

  • Patch needs improvement set

You wait for someone to review the patch. I believe it needs some improvement, so I've set that flag. This method in Django is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. In the notes above the doc changes that have been suggested involve removing the notes that it is intended only for use in data import scripts, that change is not included in the patch. The suggested changes were intended to stop scaring people off of using this method, but rather than that I believe the patch would scare more people off of using it. The patch starts out by saying incorrect duplicates could be created by get_or_create and ends with saying integrity error could be raised by it. The former will only be the case if it is used incorrectly (if get parameters don't have correct unique constraints specified for them) and the latter will happen only for a mis-configured/broken database (MySQL with repeatable read transaction isolation level). The changes should focus on telling readers what they need to do to get the method to behave correctly: they need to correctly specify uniqueness constraints for the parameters they are using on get, and they must not use "repeatable read" transaction isolation level.

comment:18 Changed 2 years ago by anonymous

  • Owner changed from vijay_shanker to anonymous
  • Status changed from new to assigned

comment:19 Changed 2 years ago by vijay_shanker

  • Owner changed from anonymous to vijay_shanker

comment:20 Changed 2 years ago by vijay_shanker

  • Owner vijay_shanker deleted
  • Status changed from assigned to new

comment:21 Changed 22 months ago by timo

  • Patch needs improvement unset

Updated PR

comment:22 Changed 22 months ago by Tim Graham <timograham@…>

  • Owner set to Tim Graham <timograham@…>
  • Resolution set to fixed
  • Status changed from new to closed

In 428de2e3394873810645010020d65841ab72bb14:

Fixed #12579 -- Noted QuerySet.get_or_create() depends on database unique constraints.

Thanks timmolendijk, jdunck, vijay_shanker, and loic84.

comment:23 Changed 22 months ago by Tim Graham <timograham@…>

In 9687e387d80bba33f82593915bb1351ae73e54f6:

[1.6.x] Fixed #12579 -- Noted QuerySet.get_or_create() depends on database unique constraints.

Thanks timmolendijk, jdunck, vijay_shanker, and loic84.

Backport of 428de2e339 from master.

comment:24 Changed 22 months ago by Tim Graham <timograham@…>

In fa37f45c32c8fa7c6909b72e94b9d90e6efe7ed9:

[1.5.x] Fixed #12579 -- Noted QuerySet.get_or_create() depends on database unique constraints.

Thanks timmolendijk, jdunck, vijay_shanker, and loic84.

Backport of 428de2e339 from master.

Note: See TracTickets for help on using tickets.
Back to Top