Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#30199 closed Cleanup/optimization (fixed)

get_or_create documentation encourages unsafe usage

Reported by: Alex Owned by: Alex
Component: Documentation Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The documentation for QuerySet.get_or_create() (https://docs.djangoproject.com/en/2.1/ref/models/querysets/#get-or-create) buries the following caveat in an unhighlighted paragraph of text more than a page in:

This method is atomic assuming correct usage, correct database configuration, and correct behavior of the underlying database. However, if uniqueness is not enforced at the database level for the kwargs used in a get_or_create call (see unique or unique_together), this method is prone to a race-condition which can result in multiple rows with the same parameters being inserted simultaneously.

In practice I have seen get_or_create() used with non-unique kwargs many times, so this caveat should come sooner in the documentation and be inside a "Warning" box.

The next paragraph of the documentation says:

If you are using MySQL, be sure to use the READ COMMITTED isolation level rather than REPEATABLE READ (the default), otherwise you may see cases where get_or_create will raise an IntegrityError but the object won’t appear in a subsequent get() call.

This is dangerous and generally bad advice. By lowering the isolation level, you are trading off the occasional DoesNotExist during concurrent writes for subtle data integrity bugs, which is rarely an improvement. The trade-off should be presented but READ COMMITTED should not be recommended.

Change History (8)

comment:1 by Carlton Gibson, 5 years ago

Easy pickings: unset
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: 2.1master

Hi Alex,

Thanks for the report. I'll accept this thinking that you're prepared to prepare a patch, and that, with that in place, we can ask the DB experts to give their opinion on review.

I've unchecked Easy Pickings, since even if the change itself is easy enough, the topic is not. 🙂

comment:2 by Alex, 5 years ago

Owner: changed from nobody to Alex
Status: newassigned

Thanks for the quick triage! I will try to add a patch today.

comment:3 by Alex, 5 years ago

Has patch: set
Last edited 5 years ago by Tim Graham (previous) (diff)

comment:4 by Tim Graham, 5 years ago

Since #27683 (Django 2.0), Django defaults to read committed and the documentation says, "Django works best with and defaults to read committed rather than MySQL’s default, repeatable read. Data loss is possible with repeatable read." The note about "REPEATABLE READ (the default)" in QuerySet.get_or_create() is therefore incorrect and perhaps obsolete.

comment:5 by Carlton Gibson, 5 years ago

Patch needs improvement: set

As per comment on PR there's Tim's point to be addressed, and I think probably a move of this discussion to docs/ref/databases.txt where the MySQL isolation level is discussed, since Django's default behaviour is by-and-large correct since 924af638e4d4fb8eb46a19ac0cafcb2e83480cf3, which was part of #27683.

comment:6 by Carlton Gibson, 5 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

Adam approved the PR, so marking RFC for final review.

comment:7 by Carlton Gibson <carlton.gibson@…>, 5 years ago

Resolution: fixed
Status: assignedclosed

In 1686dce0:

Fixed #30199 -- Adjusted QuerySet.get_or_create() docs to highlight atomicity warning.

comment:8 by Carlton Gibson <carlton.gibson@…>, 5 years ago

In 4ef96cce:

[2.2.x] Fixed #30199 -- Adjusted QuerySet.get_or_create() docs to highlight atomicity warning.

Backport of 1686dce06c1f3587e90ea98816eddaa965fd9f45 from master

Note: See TracTickets for help on using tickets.
Back to Top