#30199 closed Cleanup/optimization (fixed)
get_or_create documentation encourages unsafe usage
Reported by: | Alex | Owned by: | Alex |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The documentation for QuerySet.get_or_create() (https://docs.djangoproject.com/en/2.1/ref/models/querysets/#get-or-create) buries the following caveat in an unhighlighted paragraph of text more than a page in:
This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level for the kwargs used in a get_or_create call (see unique or unique_together), this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.
In practice I have seen get_or_create() used with non-unique kwargs many times, so this caveat should come sooner in the documentation and be inside a "Warning" box.
The next paragraph of the documentation says:
If you are using MySQL, be sure to use the READ COMMITTED isolation level rather than REPEATABLE READ (the default), otherwise you may see cases where get_or_create will raise an IntegrityError but the object won’t appear in a subsequent get() call.
This is dangerous and generally bad advice. By lowering the isolation level, you are trading off the occasional DoesNotExist during concurrent writes for subtle data integrity bugs, which is rarely an improvement. The trade-off should be presented but READ COMMITTED should not be recommended.
Change History (8)
comment:1 by , 6 years ago
Easy pickings: | unset |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Version: | 2.1 → master |
comment:2 by , 6 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Thanks for the quick triage! I will try to add a patch today.
comment:4 by , 6 years ago
Since #27683 (Django 2.0), Django defaults to read committed and the documentation says, "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read." The note about "REPEATABLE READ (the default)" in QuerySet.get_or_create()
is therefore incorrect and perhaps obsolete.
comment:5 by , 6 years ago
Patch needs improvement: | set |
---|
As per comment on PR there's Tim's point to be addressed, and I think probably a move of this discussion to docs/ref/databases.txt
where the MySQL isolation level is discussed, since Django's default behaviour is by-and-large correct since 924af638e4d4fb8eb46a19ac0cafcb2e83480cf3, which was part of #27683.
comment:6 by , 5 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Adam approved the PR, so marking RFC for final review.
Hi Alex,
Thanks for the report. I'll accept this thinking that you're prepared to prepare a patch, and that, with that in place, we can ask the DB experts to give their opinion on review.
I've unchecked Easy Pickings, since even if the change itself is easy enough, the topic is not. 🙂