Opened 3 years ago
Last modified 3 years ago
#28806 new Bug
Mechanism of fetching related objects violates READ COMMITTED assumption of Django ORM
Reported by: | Aaron Tan | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | master |
Severity: | Normal | Keywords: | database, read-committed, concurrency control |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Found here: https://github.com/django/django/blob/master/django/db/models/fields/related_descriptors.py#L141.
def get_object(self, instance): qs = self.get_queryset(instance=instance) # Assuming the database enforces foreign keys, this won't fail. return qs.get(self.field.get_reverse_related_filter(instance))
The comment in that function states: # Assuming the database enforces foreign keys, this won't fail.
, but it's not the case. I will illustrate with a use case we met:
Suppose we have 2 concurrent transactions A and B talking to the same Postgresql backend, and we have two models:
class Foo: pass class Bar: fk = ForeignKey(Foo, on_delete=models.SET_NULL)
populated by objects
foo = Foo() foo.save() bar = Bar(fk=foo) bar.save()
Transaction A runs
bar = Bar.objects.get(pk=<bar's pk>)
Then transaction B runs
Foo.objects.get(pk=<foo's pk>).delete()
If READ COMMITTED is assumed by Django, transaction A will see the db snapshot with foo
deleted. But if transaction A subsequently did
bar.fk #Access foreign key field `fk` for the first time
Because the get_object
method linked does not catch ObjectDoesNotExist
exception tossed by get
, the last bar.fk
will raise ObjectDoesNotExist
, thus contradicts the assumption
the database enforces foreign keys, this won't fail.
We currently manually catch ObjectDoesNotExist
but that makes code ugly, I suggest instantiate the related field to None
in that case, instead of raising exception.
Change History (4)
comment:1 follow-up: 2 Changed 3 years ago by
comment:2 Changed 3 years ago by
Replying to Tomer Chachamu:
The comment is definitely inaccurate, but I don't think the attribute returning
None
is the right answer. Sure, the FK is set to NULL on conflict, but maybe transaction B also updatedBar.objects.get(pk=<bar's pk>).pk
to a new value? In this case, A will see the new value after abar.refresh_from_db()
, that is notNone
.
The docs could probably be improved, to mention that you can use
select_related
to prevent thisDoesNotExist
from being thrown. (Although, it doesn't remove the fact that you will insert on save, when you probably meant to update).
I see, although manually modifying primary key seems like a kinda infrequent operation, but that makes sense. In this case, can we at least capture ObjectDoesNotExist
there and make a partial refresh_from_database
by only refreshing and instantiating the related field in question? Since selector works on a pretty low level of Django's call stack, it might be possible to do that? After all, this means "any fetching related object for the first time is theoretically vulnerable to such concurrency issue", wrapping all of them around try...except...
is cumbersome and error-prone in large and convoluted codebases.
Also, using select_related
to supress DoesNotExist
was our first solution, but somehow it does not work. Maybe it is worth investigating/experimenting.
comment:3 Changed 3 years ago by
Triage Stage: | Unreviewed → Accepted |
---|
Accepting for further investigating. I haven't taken time to understand whether or not a change should be made.
comment:4 Changed 3 years ago by
There's nothing else to do for this case except throw an informative error. This is what you get with concurrent database requests.
BTW there's another reason why the comment is wrong. Django uses deferred foreign keys, so nothing prevents you from deleting the parent model inside a transaction, the fk is only validated during commit.
The comment is definitely inaccurate, but I don't think the attribute returning
None
is the right answer. Sure, the FK is set to NULL on conflict, but maybe transaction B also updatedBar.objects.get(pk=<bar's pk>).pk
to a new value? In this case, A will see the new value after abar.refresh_from_db()
, that is notNone
.The docs could probably be improved, to mention that you can use
select_related
to prevent thisDoesNotExist
from being thrown. (Although, it doesn't remove the fact that you will insert on save, when you probably meant to update).