#12579 closed Bug (fixed)
QuerySet.get_or_create()'s behavior depends on database's uniqueness restrictions
Reported by: | Tim Molendijk | Owned by: | |
---|---|---|---|
Component: | Documentation | Version: | 1.0 |
Severity: | Normal | Keywords: | get_or_create concurrency race unique documentation |
Cc: | tomasz.zielinski@…, vijay_shanker | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
QuerySet.get_or_create()
works as expected, IFF the kwargs that are provided correspond to database columns with appropriate uniqueness restrictions. In other words: get_or_create()
depends on an IntegrityError being raised under certain circumstances. If you have a database that does not do this, f.e. because a table's uniqueness logic cannot be expressed with the database's UNIQUE
construct.
For instance:
Given the following table: CREATE TABLE t1 (i INT, flag TINYINT, parent_id INT, PRIMARY KEY (i)) ENGINE = InnoDB;
An additional restriction, one which cannot be expressed in the table definition, is that only one record can exist with flag=1
for every given parent_id
.
The following code gives the impression that we enforce said restriction: T1.objects.get_or_create(parent_id=3, flag=1)
.
But this is a false impression, as two concurrent transactions executing this same code can (and will on a regular basis) result in two records with parent_id=3
and flag=1
. This is the result of both transactions trying to get()
a record with parent_id=3
and flag=1
before any of them has inserted this record. So both transactions will conclude they need to create{
} the record, which will be allowed by the database and thus an inconsistency will enter the database.
Now the ideal scenario would be that race conditions are prevented regardless of how the underlying database table is defined. But I am not sure if this is a feasible approach. At the very least I think people who use get_or_create()
need to be aware that it can give rise to concurrency issues, unless they follow some basic guidelines... something like that.
Attachments (1)
Change History (25)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Triage Stage: | Unreviewed → Accepted |
I agree with Karen - this is an area for documentation improvement describing the circumstances where the method can be used safely, not an opportunity to reproduce database logic in Django's ORM
comment:3 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:4 by , 14 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
JFYI get_or_create is broken also in one other way: I think the problem is still there: http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django/2235624#2235624
comment:7 by , 12 years ago
We (Votizen) were seeing the REPEATABLE READ visibility problem (described in the StackOverflow link above) on mysql+innodb+REPEATABLE READ. To me, the suggestion on StackOverflow- of committing whatever transaction is pending - is bogus. Transactions need to be composable, and committing so you can read means you can't combine this function with any other function that requires an ongoing transaction.
We switched (about 6 months ago) to READ COMMITTED. This reduces throughput in mysql, but is a good tradeoff to me.
I think that since MySQL's innodb default transaction isolation is incompatible with Django's get_or_create approach, it would be good to either note in the docs for get_or_create, or advise setting READ COMMITTED for all users of mysql.
comment:8 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
by , 12 years ago
Attachment: | 12579.diff added |
---|
added a paragraph referring get_or_create concurrency issue
comment:9 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Triage Stage: | Accepted → Ready for checkin |
comment:10 by , 12 years ago
Cc: | added |
---|
comment:11 by , 12 years ago
Triage Stage: | Ready for checkin → Unreviewed |
---|
comment:12 by , 12 years ago
Triage Stage: | Unreviewed → Ready for checkin |
---|
comment:13 by , 12 years ago
Has patch: | set |
---|---|
Resolution: | fixed |
Status: | closed → new |
Triage Stage: | Ready for checkin → Accepted |
Please don't mark your own patch as Ready for checking as it needs to be reviewed by someone else. see https://docs.djangoproject.com/en/dev/internals/contributing/triaging-tickets/
comment:14 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
follow-up: 16 comment:15 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
A ticket is not marked fixed until it has been committed.
comment:16 by , 12 years ago
Replying to timo:
A ticket is not marked fixed until it has been committed.
ok. what should i do now ? i stll own the ticket and now it is marked as new ?
comment:17 by , 12 years ago
Patch needs improvement: | set |
---|
You wait for someone to review the patch. I believe it needs some improvement, so I've set that flag. This method in Django is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. In the notes above the doc changes that have been suggested involve removing the notes that it is intended only for use in data import scripts, that change is not included in the patch. The suggested changes were intended to stop scaring people off of using this method, but rather than that I believe the patch would scare more people off of using it. The patch starts out by saying incorrect duplicates could be created by get_or_create
and ends with saying integrity error could be raised by it. The former will only be the case if it is used incorrectly (if get parameters don't have correct unique constraints specified for them) and the latter will happen only for a mis-configured/broken database (MySQL with repeatable read transaction isolation level). The changes should focus on telling readers what they need to do to get the method to behave correctly: they need to correctly specify uniqueness constraints for the parameters they are using on get, and they must not use "repeatable read" transaction isolation level.
comment:18 by , 12 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:19 by , 12 years ago
Owner: | changed from | to
---|
comment:20 by , 12 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:22 by , 11 years ago
Owner: | set to |
---|---|
Resolution: | → fixed |
Status: | new → closed |
I don't agree that the ideal situation is that Django code strive to prevent race conditions regardless of how the underlying database tables are defined. Rather I believe the ideal situation is that tables be defined such that the database is where race conditions get dealt with. Databases already have to deal with these issues, I don't believe attempting to duplicate their effort correctly in Django is a good idea.
In this specific case it seems possible to get quite close to the kind of database constraint you are looking for by declaring
flag
to be a NullBooleanField that isunique_together
withparent_id
. The database will then disallow multiple 1 (True) values for any givenparent_id
. It will also disallow multiple 0 (False) values. If that is an issue -- that is, if you need to be able to allow multiple "non 1" values for a givenparent_id
, then you can use NULL instead of 0 to represent the "not 1" case, since NULL != NULL you will be able to have multiple NULL entries for a singleparent_id
.I think I agree, however, that the doc for
get_or_create
could be improved. The original implementation of this method did not attempt to support concurrent access. The doc (http://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create-kwargs) states its intended use case is things like data-import scripts and rather discourages its use in views. When #6641 was fixed the method became safe and useful to use for atomic creation of objects (assuming proper model constraints are specified) in a view. Unless I'm missing something, I don't see why the doc shouldn't make it clear that the routine is safe to use for this function.