#27403 closed Cleanup/optimization (fixed)
Document that prefetch_related doesn't guarantee transactional consistency
Reported by: | Aymeric Augustin | Owned by: | Tim Bell |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Shai Berger | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Let's assume a Parent
model and a Child
model with a FK to Parent
.
Database initially looks like this:
parent1 -- child1
User runs Parent.objects.all().prefetch_related('child_set')
.
Django fetches parent1
.
Another transaction commits, resulting in this structure:
parent1 +- child1 `- child2 parent2
Django fetches all children pointing to parent1
, which returns child1
and child2
.
The result of the query is:
parent1 +- child1 `- child2
But at no point in the history did the database look like this.
This bug report is based on code inspection.
I verified that prefetch_related
doesn't start a transaction.
However I didn't write code exhibiting the race condition because that's a bit complicated.
Change History (9)
comment:1 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 8 years ago
comment:3 by , 8 years ago
Cc: | added |
---|
Rivo, I think you're wrong, but not in the direction you suspected: In fact, REPEATABLE READ would still not fix this, and as far as I understand, on any database that isn't PG, even SERIALIZABLE won't -- unless we first select (with no need to fetch) the join of the tables. But that would undo much of the performance benefits of prefetch_related
...
Some databse engines -- notably SQL Server -- can return several result sets (=querysets) from a single command (typically a procedure call). Something like that could, at least in principle, help us here.
comment:4 by , 8 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Summary: | prefetch_related doesn't guarantee transactional consistency → Document that prefetch_related doesn't guarantee transactional consistency |
Type: | Bug → Cleanup/optimization |
I agree with both conclusions. Repurposing the ticket for a mention in the documentation.
comment:5 by , 13 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I'm looking at this ticket during the DjangoCon sprints.
I've reproduced a version of this race condition using the models defined in the prefetch_related()
docs (https://docs.djangoproject.com/en/4.2/ref/models/querysets/#prefetch-related). By adding a sleep()
at the start of prefetch_related_objects()
, I was able to run another database query in between the two queries required to evaluate Pizza.objects.prefetch_related('toppings')
.
When deleting Pizza.objects.get(name="Hawaiian")
during the sleep, we get this:
>>> Pizza.objects.prefetch_related('toppings') Sleeping... <QuerySet [<Pizza: Hawaiian ()>, <Pizza: Seafood (prawns, smoked salmon)>]>
Just like in the example from this ticket's description, at no point in the history did the database look like this.
I'll now try to write a succinct description of this issue (only one or two sentences) to add to the docs.
comment:6 by , 13 months ago
Has patch: | set |
---|
comment:7 by , 13 months ago
Triage Stage: | Accepted → Ready for checkin |
---|
Correct me if I'm wrong, but I think just wrapping the
prefetch_related()
in a transaction wouldn't fix this.By default, at least in Postgres, queries made within a transaction can still see data from other _committed_ transactions, so the same problem would occur.
In Postgres, you could select a higher isolation level, I think Repeatable Read would prevent the case you've described. But this has other consequences and should be done on per-app basis. See https://www.postgresql.org/docs/current/static/transaction-iso.html for more info.
If I'm right, maybe docs should note that there might be edge cases regards transactional consistency?