#12579 closed Bug (fixed)
QuerySet.get_or_create()'s behavior depends on database's uniqueness restrictions
Reported by: | Tim Molendijk | Owned by: | |
---|---|---|---|
Component: | Documentation | Version: | 1.0 |
Severity: | Normal | Keywords: | get_or_create concurrency race unique documentation |
Cc: | tomasz.zielinski@…, vijay_shanker | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | yes | UI/UX: | no |
Description
QuerySet.get_or_create()
works as expected, IFF the kwargs that are provided correspond to database columns with appropriate uniqueness restrictions. In other words: get_or_create()
depends on an IntegrityError being raised under certain circumstances. If you have a database that does not do this, f.e. because a table's uniqueness logic cannot be expressed with the database's UNIQUE
construct.
For instance:
Given the following table: CREATE TABLE t1 (i INT, flag TINYINT, parent_id INT, PRIMARY KEY (i)) ENGINE = InnoDB;
An additional restriction, one which cannot be expressed in the table definition, is that only one record can exist with flag=1
for every given parent_id
.
The following code gives the impression that we enforce said restriction: T1.objects.get_or_create(parent_id=3, flag=1)
.
But this is a false impression, as two concurrent transactions executing this same code can (and will on a regular basis) result in two records with parent_id=3
and flag=1
. This is the result of both transactions trying to get()
a record with parent_id=3
and flag=1
before any of them has inserted this record. So both transactions will conclude they need to create{
} the record, which will be allowed by the database and thus an inconsistency will enter the database.
Now the ideal scenario would be that race conditions are prevented regardless of how the underlying database table is defined. But I am not sure if this is a feasible approach. At the very least I think people who use get_or_create()
need to be aware that it can give rise to concurrency issues, unless they follow some basic guidelines... something like that.
Attachments (1)
Change History (25)
comment:1 Changed 13 years ago by
comment:2 Changed 13 years ago by
Component: | Database layer (models, ORM) → Documentation |
---|---|
Triage Stage: | Unreviewed → Accepted |
I agree with Karen - this is an area for documentation improvement describing the circumstances where the method can be used safely, not an opportunity to reproduce database logic in Django's ORM
comment:3 Changed 12 years ago by
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:4 Changed 12 years ago by
Cc: | tomasz.zielinski@… added |
---|---|
Easy pickings: | unset |
JFYI get_or_create is broken also in one other way: I think the problem is still there: http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django/2235624#2235624
comment:5 Changed 12 years ago by
Easy pickings: | set |
---|
(Sorry for accidential unsetting the easy pickings)
comment:7 Changed 11 years ago by
We (Votizen) were seeing the REPEATABLE READ visibility problem (described in the StackOverflow link above) on mysql+innodb+REPEATABLE READ. To me, the suggestion on StackOverflow- of committing whatever transaction is pending - is bogus. Transactions need to be composable, and committing so you can read means you can't combine this function with any other function that requires an ongoing transaction.
We switched (about 6 months ago) to READ COMMITTED. This reduces throughput in mysql, but is a good tradeoff to me.
I think that since MySQL's innodb default transaction isolation is incompatible with Django's get_or_create approach, it would be good to either note in the docs for get_or_create, or advise setting READ COMMITTED for all users of mysql.
comment:8 Changed 10 years ago by
Owner: | changed from nobody to vijay_shanker |
---|---|
Status: | new → assigned |
Changed 10 years ago by
Attachment: | 12579.diff added |
---|
added a paragraph referring get_or_create concurrency issue
comment:9 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Triage Stage: | Accepted → Ready for checkin |
comment:10 Changed 10 years ago by
Cc: | vijay_shanker added |
---|
comment:11 Changed 10 years ago by
Triage Stage: | Ready for checkin → Unreviewed |
---|
comment:12 Changed 10 years ago by
Triage Stage: | Unreviewed → Ready for checkin |
---|
comment:13 Changed 10 years ago by
Has patch: | set |
---|---|
Resolution: | fixed |
Status: | closed → new |
Triage Stage: | Ready for checkin → Accepted |
Please don't mark your own patch as Ready for checking as it needs to be reviewed by someone else. see https://docs.djangoproject.com/en/dev/internals/contributing/triaging-tickets/
comment:14 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:15 follow-up: 16 Changed 10 years ago by
Resolution: | fixed |
---|---|
Status: | closed → new |
A ticket is not marked fixed until it has been committed.
comment:16 Changed 10 years ago by
Replying to timo:
A ticket is not marked fixed until it has been committed.
ok. what should i do now ? i stll own the ticket and now it is marked as new ?
comment:17 Changed 10 years ago by
Patch needs improvement: | set |
---|
You wait for someone to review the patch. I believe it needs some improvement, so I've set that flag. This method in Django is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. In the notes above the doc changes that have been suggested involve removing the notes that it is intended only for use in data import scripts, that change is not included in the patch. The suggested changes were intended to stop scaring people off of using this method, but rather than that I believe the patch would scare more people off of using it. The patch starts out by saying incorrect duplicates could be created by get_or_create
and ends with saying integrity error could be raised by it. The former will only be the case if it is used incorrectly (if get parameters don't have correct unique constraints specified for them) and the latter will happen only for a mis-configured/broken database (MySQL with repeatable read transaction isolation level). The changes should focus on telling readers what they need to do to get the method to behave correctly: they need to correctly specify uniqueness constraints for the parameters they are using on get, and they must not use "repeatable read" transaction isolation level.
comment:18 Changed 10 years ago by
Owner: | changed from vijay_shanker to anonymous |
---|---|
Status: | new → assigned |
comment:19 Changed 10 years ago by
Owner: | changed from anonymous to vijay_shanker |
---|
comment:20 Changed 10 years ago by
Owner: | vijay_shanker deleted |
---|---|
Status: | assigned → new |
comment:22 Changed 10 years ago by
Owner: | set to Tim Graham <timograham@…> |
---|---|
Resolution: | → fixed |
Status: | new → closed |
I don't agree that the ideal situation is that Django code strive to prevent race conditions regardless of how the underlying database tables are defined. Rather I believe the ideal situation is that tables be defined such that the database is where race conditions get dealt with. Databases already have to deal with these issues, I don't believe attempting to duplicate their effort correctly in Django is a good idea.
In this specific case it seems possible to get quite close to the kind of database constraint you are looking for by declaring
flag
to be a NullBooleanField that isunique_together
withparent_id
. The database will then disallow multiple 1 (True) values for any givenparent_id
. It will also disallow multiple 0 (False) values. If that is an issue -- that is, if you need to be able to allow multiple "non 1" values for a givenparent_id
, then you can use NULL instead of 0 to represent the "not 1" case, since NULL != NULL you will be able to have multiple NULL entries for a singleparent_id
.I think I agree, however, that the doc for
get_or_create
could be improved. The original implementation of this method did not attempt to support concurrent access. The doc (http://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create-kwargs) states its intended use case is things like data-import scripts and rather discourages its use in views. When #6641 was fixed the method became safe and useful to use for atomic creation of objects (assuming proper model constraints are specified) in a view. Unless I'm missing something, I don't see why the doc shouldn't make it clear that the routine is safe to use for this function.