#27403 closed Cleanup/optimization (fixed)
Document that prefetch_related doesn't guarantee transactional consistency
| Reported by: | Aymeric Augustin | Owned by: | Tim Bell |
|---|---|---|---|
| Component: | Documentation | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Shai Berger | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Let's assume a Parent model and a Child model with a FK to Parent.
Database initially looks like this:
parent1 -- child1
User runs Parent.objects.all().prefetch_related('child_set').
Django fetches parent1.
Another transaction commits, resulting in this structure:
parent1 +- child1
`- child2
parent2
Django fetches all children pointing to parent1, which returns child1 and child2.
The result of the query is:
parent1 +- child1
`- child2
But at no point in the history did the database look like this.
This bug report is based on code inspection.
I verified that prefetch_related doesn't start a transaction.
However I didn't write code exhibiting the race condition because that's a bit complicated.
Change History (9)
comment:1 by , 9 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:2 by , 9 years ago
comment:3 by , 9 years ago
| Cc: | added |
|---|
Rivo, I think you're wrong, but not in the direction you suspected: In fact, REPEATABLE READ would still not fix this, and as far as I understand, on any database that isn't PG, even SERIALIZABLE won't -- unless we first select (with no need to fetch) the join of the tables. But that would undo much of the performance benefits of prefetch_related...
Some databse engines -- notably SQL Server -- can return several result sets (=querysets) from a single command (typically a procedure call). Something like that could, at least in principle, help us here.
comment:4 by , 9 years ago
| Component: | Database layer (models, ORM) → Documentation |
|---|---|
| Summary: | prefetch_related doesn't guarantee transactional consistency → Document that prefetch_related doesn't guarantee transactional consistency |
| Type: | Bug → Cleanup/optimization |
I agree with both conclusions. Repurposing the ticket for a mention in the documentation.
comment:5 by , 2 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
I'm looking at this ticket during the DjangoCon sprints.
I've reproduced a version of this race condition using the models defined in the prefetch_related() docs (https://docs.djangoproject.com/en/4.2/ref/models/querysets/#prefetch-related). By adding a sleep() at the start of prefetch_related_objects(), I was able to run another database query in between the two queries required to evaluate Pizza.objects.prefetch_related('toppings').
When deleting Pizza.objects.get(name="Hawaiian") during the sleep, we get this:
>>> Pizza.objects.prefetch_related('toppings')
Sleeping...
<QuerySet [<Pizza: Hawaiian ()>, <Pizza: Seafood (prawns, smoked salmon)>]>
Just like in the example from this ticket's description, at no point in the history did the database look like this.
I'll now try to write a succinct description of this issue (only one or two sentences) to add to the docs.
comment:7 by , 2 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
Correct me if I'm wrong, but I think just wrapping the
prefetch_related()in a transaction wouldn't fix this.By default, at least in Postgres, queries made within a transaction can still see data from other _committed_ transactions, so the same problem would occur.
In Postgres, you could select a higher isolation level, I think Repeatable Read would prevent the case you've described. But this has other consequences and should be done on per-app basis. See https://www.postgresql.org/docs/current/static/transaction-iso.html for more info.
If I'm right, maybe docs should note that there might be edge cases regards transactional consistency?