Opened 15 years ago

Closed 11 years ago

Last modified 11 years ago

#12579 closed Bug (fixed)

QuerySet.get_or_create()'s behavior depends on database's uniqueness restrictions

Reported by: Tim Molendijk Owned by: Tim Graham <timograham@…>
Component: Documentation Version: 1.0
Severity: Normal Keywords: get_or_create concurrency race unique documentation
Cc: tomasz.zielinski@…, vijay_shanker Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

QuerySet.get_or_create() works as expected, IFF the kwargs that are provided correspond to database columns with appropriate uniqueness restrictions. In other words: get_or_create() depends on an IntegrityError being raised under certain circumstances. If you have a database that does not do this, f.e. because a table's uniqueness logic cannot be expressed with the database's UNIQUE construct.

For instance:

Given the following table: CREATE TABLE t1 (i INT, flag TINYINT, parent_id INT, PRIMARY KEY (i)) ENGINE = InnoDB;
An additional restriction, one which cannot be expressed in the table definition, is that only one record can exist with flag=1 for every given parent_id.
The following code gives the impression that we enforce said restriction: T1.objects.get_or_create(parent_id=3, flag=1).
But this is a false impression, as two concurrent transactions executing this same code can (and will on a regular basis) result in two records with parent_id=3 and flag=1. This is the result of both transactions trying to get() a record with parent_id=3 and flag=1 before any of them has inserted this record. So both transactions will conclude they need to create{} the record, which will be allowed by the database and thus an inconsistency will enter the database.

Now the ideal scenario would be that race conditions are prevented regardless of how the underlying database table is defined. But I am not sure if this is a feasible approach. At the very least I think people who use get_or_create() need to be aware that it can give rise to concurrency issues, unless they follow some basic guidelines... something like that.

Attachments (1)

12579.diff (970 bytes ) - added by vijay_shanker 12 years ago.
added a paragraph referring get_or_create concurrency issue

Download all attachments as: .zip

Change History (25)

comment:1 by Karen Tracey, 15 years ago

I don't agree that the ideal situation is that Django code strive to prevent race conditions regardless of how the underlying database tables are defined. Rather I believe the ideal situation is that tables be defined such that the database is where race conditions get dealt with. Databases already have to deal with these issues, I don't believe attempting to duplicate their effort correctly in Django is a good idea.

In this specific case it seems possible to get quite close to the kind of database constraint you are looking for by declaring flag to be a NullBooleanField that is unique_together with parent_id. The database will then disallow multiple 1 (True) values for any given parent_id. It will also disallow multiple 0 (False) values. If that is an issue -- that is, if you need to be able to allow multiple "non 1" values for a given parent_id, then you can use NULL instead of 0 to represent the "not 1" case, since NULL != NULL you will be able to have multiple NULL entries for a single parent_id.

I think I agree, however, that the doc for get_or_create could be improved. The original implementation of this method did not attempt to support concurrent access. The doc (http://docs.djangoproject.com/en/dev/ref/models/querysets/#get-or-create-kwargs) states its intended use case is things like data-import scripts and rather discourages its use in views. When #6641 was fixed the method became safe and useful to use for atomic creation of objects (assuming proper model constraints are specified) in a view. Unless I'm missing something, I don't see why the doc shouldn't make it clear that the routine is safe to use for this function.

comment:2 by Russell Keith-Magee, 15 years ago

Component: Database layer (models, ORM)Documentation
Triage Stage: UnreviewedAccepted

I agree with Karen - this is an area for documentation improvement describing the circumstances where the method can be used safely, not an opportunity to reproduce database logic in Django's ORM

comment:3 by Matt McClanahan, 14 years ago

Severity: Normal
Type: Bug

comment:4 by Tomasz Zieliński, 14 years ago

Cc: tomasz.zielinski@… added
Easy pickings: unset

JFYI get_or_create is broken also in one other way: I think the problem is still there: http://stackoverflow.com/questions/2235318/how-do-i-deal-with-this-race-condition-in-django/2235624#2235624

comment:5 by Tomasz Zieliński, 14 years ago

Easy pickings: set

(Sorry for accidential unsetting the easy pickings)

comment:6 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:7 by Jeremy Dunck, 12 years ago

We (Votizen) were seeing the REPEATABLE READ visibility problem (described in the StackOverflow link above) on mysql+innodb+REPEATABLE READ. To me, the suggestion on StackOverflow- of committing whatever transaction is pending - is bogus. Transactions need to be composable, and committing so you can read means you can't combine this function with any other function that requires an ongoing transaction.

We switched (about 6 months ago) to READ COMMITTED. This reduces throughput in mysql, but is a good tradeoff to me.

I think that since MySQL's innodb default transaction isolation is incompatible with Django's get_or_create approach, it would be good to either note in the docs for get_or_create, or advise setting READ COMMITTED for all users of mysql.

comment:8 by vijay_shanker, 12 years ago

Owner: changed from nobody to vijay_shanker
Status: newassigned

by vijay_shanker, 12 years ago

Attachment: 12579.diff added

added a paragraph referring get_or_create concurrency issue

comment:9 by vijay_shanker, 12 years ago

Resolution: fixed
Status: assignedclosed
Triage Stage: AcceptedReady for checkin

comment:10 by vijay_shanker, 12 years ago

Cc: vijay_shanker added

comment:11 by vijay_shanker, 12 years ago

Triage Stage: Ready for checkinUnreviewed
Version 0, edited 12 years ago by vijay_shanker (next)

comment:12 by vijay_shanker, 12 years ago

Triage Stage: UnreviewedReady for checkin

comment:13 by Tim Graham, 12 years ago

Has patch: set
Resolution: fixed
Status: closednew
Triage Stage: Ready for checkinAccepted

Please don't mark your own patch as Ready for checking as it needs to be reviewed by someone else. see https://docs.djangoproject.com/en/dev/internals/contributing/triaging-tickets/

comment:14 by vijay_shanker, 12 years ago

Resolution: fixed
Status: newclosed

comment:15 by Tim Graham, 12 years ago

Resolution: fixed
Status: closednew

A ticket is not marked fixed until it has been committed.

in reply to:  15 comment:16 by vijay_shanker, 12 years ago

Replying to timo:

A ticket is not marked fixed until it has been committed.

ok. what should i do now ? i stll own the ticket and now it is marked as new ?

comment:17 by anonymous, 12 years ago

Patch needs improvement: set

You wait for someone to review the patch. I believe it needs some improvement, so I've set that flag. This method in Django is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. In the notes above the doc changes that have been suggested involve removing the notes that it is intended only for use in data import scripts, that change is not included in the patch. The suggested changes were intended to stop scaring people off of using this method, but rather than that I believe the patch would scare more people off of using it. The patch starts out by saying incorrect duplicates could be created by get_or_create and ends with saying integrity error could be raised by it. The former will only be the case if it is used incorrectly (if get parameters don't have correct unique constraints specified for them) and the latter will happen only for a mis-configured/broken database (MySQL with repeatable read transaction isolation level). The changes should focus on telling readers what they need to do to get the method to behave correctly: they need to correctly specify uniqueness constraints for the parameters they are using on get, and they must not use "repeatable read" transaction isolation level.

comment:18 by anonymous, 12 years ago

Owner: changed from vijay_shanker to anonymous
Status: newassigned

comment:19 by vijay_shanker, 12 years ago

Owner: changed from anonymous to vijay_shanker

comment:20 by vijay_shanker, 12 years ago

Owner: vijay_shanker removed
Status: assignednew

comment:21 by Tim Graham, 11 years ago

Patch needs improvement: unset

Updated PR

comment:22 by Tim Graham <timograham@…>, 11 years ago

Owner: set to Tim Graham <timograham@…>
Resolution: fixed
Status: newclosed

In 428de2e3394873810645010020d65841ab72bb14:

Fixed #12579 -- Noted QuerySet.get_or_create() depends on database unique constraints.

Thanks timmolendijk, jdunck, vijay_shanker, and loic84.

comment:23 by Tim Graham <timograham@…>, 11 years ago

In 9687e387d80bba33f82593915bb1351ae73e54f6:

[1.6.x] Fixed #12579 -- Noted QuerySet.get_or_create() depends on database unique constraints.

Thanks timmolendijk, jdunck, vijay_shanker, and loic84.

Backport of 428de2e339 from master.

comment:24 by Tim Graham <timograham@…>, 11 years ago

In fa37f45c32c8fa7c6909b72e94b9d90e6efe7ed9:

[1.5.x] Fixed #12579 -- Noted QuerySet.get_or_create() depends on database unique constraints.

Thanks timmolendijk, jdunck, vijay_shanker, and loic84.

Backport of 428de2e339 from master.

Note: See TracTickets for help on using tickets.
Back to Top