Opened 8 years ago
Closed 3 years ago
#28806 closed Bug (wontfix)
Mechanism of fetching related objects violates READ COMMITTED assumption of Django ORM
| Reported by: | Aaron Tan | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | database, read-committed, concurrency control |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Found here: https://github.com/django/django/blob/master/django/db/models/fields/related_descriptors.py#L141.
def get_object(self, instance):
qs = self.get_queryset(instance=instance)
# Assuming the database enforces foreign keys, this won't fail.
return qs.get(self.field.get_reverse_related_filter(instance))
The comment in that function states: # Assuming the database enforces foreign keys, this won't fail., but it's not the case. I will illustrate with a use case we met:
Suppose we have 2 concurrent transactions A and B talking to the same Postgresql backend, and we have two models:
class Foo:
pass
class Bar:
fk = ForeignKey(Foo, on_delete=models.SET_NULL)
populated by objects
foo = Foo() foo.save() bar = Bar(fk=foo) bar.save()
Transaction A runs
bar = Bar.objects.get(pk=<bar's pk>)
Then transaction B runs
Foo.objects.get(pk=<foo's pk>).delete()
If READ COMMITTED is assumed by Django, transaction A will see the db snapshot with foo deleted. But if transaction A subsequently did
bar.fk #Access foreign key field `fk` for the first time
Because the get_object method linked does not catch ObjectDoesNotExist exception tossed by get, the last bar.fk will raise ObjectDoesNotExist, thus contradicts the assumption
the database enforces foreign keys, this won't fail.
We currently manually catch ObjectDoesNotExist but that makes code ugly, I suggest instantiate the related field to None in that case, instead of raising exception.
Change History (6)
follow-up: 2 comment:1 by , 8 years ago
comment:2 by , 8 years ago
Replying to Tomer Chachamu:
The comment is definitely inaccurate, but I don't think the attribute returning
Noneis the right answer. Sure, the FK is set to NULL on conflict, but maybe transaction B also updatedBar.objects.get(pk=<bar's pk>).pkto a new value? In this case, A will see the new value after abar.refresh_from_db(), that is notNone.
The docs could probably be improved, to mention that you can use
select_relatedto prevent thisDoesNotExistfrom being thrown. (Although, it doesn't remove the fact that you will insert on save, when you probably meant to update).
I see, although manually modifying primary key seems like a kinda infrequent operation, but that makes sense. In this case, can we at least capture ObjectDoesNotExist there and make a partial refresh_from_database by only refreshing and instantiating the related field in question? Since selector works on a pretty low level of Django's call stack, it might be possible to do that? After all, this means "any fetching related object for the first time is theoretically vulnerable to such concurrency issue", wrapping all of them around try...except... is cumbersome and error-prone in large and convoluted codebases.
Also, using select_related to supress DoesNotExist was our first solution, but somehow it does not work. Maybe it is worth investigating/experimenting.
comment:3 by , 8 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
Accepting for further investigating. I haven't taken time to understand whether or not a change should be made.
comment:4 by , 7 years ago
There's nothing else to do for this case except throw an informative error. This is what you get with concurrent database requests.
BTW there's another reason why the comment is wrong. Django uses deferred foreign keys, so nothing prevents you from deleting the parent model inside a transaction, the fk is only validated during commit.
comment:5 by , 3 years ago
I would agree that there is little Django can do here and that we should likely close this issue as wontfix and move on. Silencing errors would do more harm than good as previous at accessing fk_id could have returned a value but then fk would be None?
We could adjust the comment but ultimately the only way to ensure the retrieval of a graph of object that is coherent at query time is to use select_related to ensure all the data is retrieved a single query. From that time objects are not locked anyhow though but the graph is coherent. It could still result in exception at attempt to reconciliate in memory representation (e.g. save(force_update) but the only sane way to prevent that is to use select_for_update and friends, no level or ORM magic will be able to always do the right thing here.
I would argue that lazy fetching of related objects honours READ COMMITED assumption of the ORM; if you defer relationship then fetching at a later point in time will attempt to reconciliate your in-memory representation of model instances with the latest committed database state possibly resulting in exceptions as we should avoid the temptation to guess and be very explicit about reconciliation failures.
comment:6 by , 3 years ago
| Resolution: | → wontfix |
|---|---|
| Status: | new → closed |
| Triage Stage: | Accepted → Unreviewed |
The comment is definitely inaccurate, but I don't think the attribute returning
Noneis the right answer. Sure, the FK is set to NULL on conflict, but maybe transaction B also updatedBar.objects.get(pk=<bar's pk>).pkto a new value? In this case, A will see the new value after abar.refresh_from_db(), that is notNone.The docs could probably be improved, to mention that you can use
select_relatedto prevent thisDoesNotExistfrom being thrown. (Although, it doesn't remove the fact that you will insert on save, when you probably meant to update).