Opened 4 weeks ago

Last modified 3 weeks ago

#28806 new Bug

Mechanism of fetching related objects violates READ COMMITTED assumption of Django ORM

Reported by: Aaron Tan Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: database, read-committed, concurrency control
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Found here: https://github.com/django/django/blob/master/django/db/models/fields/related_descriptors.py#L141.

def get_object(self, instance):
        qs = self.get_queryset(instance=instance)
        # Assuming the database enforces foreign keys, this won't fail.
        return qs.get(self.field.get_reverse_related_filter(instance))

The comment in that function states: # Assuming the database enforces foreign keys, this won't fail., but it's not the case. I will illustrate with a use case we met:

Suppose we have 2 concurrent transactions A and B talking to the same Postgresql backend, and we have two models:

class Foo:
    pass

class Bar:
    fk = ForeignKey(Foo, on_delete=models.SET_NULL)

populated by objects

foo = Foo()
foo.save()
bar = Bar(fk=foo)
bar.save()

Transaction A runs

bar = Bar.objects.get(pk=<bar's pk>)

Then transaction B runs

Foo.objects.get(pk=<foo's pk>).delete()

If READ COMMITTED is assumed by Django, transaction A will see the db snapshot with foo deleted. But if transaction A subsequently did

bar.fk #Access foreign key field `fk` for the first time

Because the get_object method linked does not catch ObjectDoesNotExist exception tossed by get, the last bar.fk will raise ObjectDoesNotExist, thus contradicts the assumption

the database enforces foreign keys, this won't fail.

We currently manually catch ObjectDoesNotExist but that makes code ugly, I suggest instantiate the related field to None in that case, instead of raising exception.

Change History (2)

comment:1 Changed 3 weeks ago by Tomer Chachamu

The comment is definitely inaccurate, but I don't think the attribute returning None is the right answer. Sure, the FK is set to NULL on conflict, but maybe transaction B also updated Bar.objects.get(pk=<bar's pk>).pk to a new value? In this case, A will see the new value after a bar.refresh_from_db(), that is not None.

The docs could probably be improved, to mention that you can use select_related to prevent this DoesNotExist from being thrown. (Although, it doesn't remove the fact that you will insert on save, when you probably meant to update).

comment:2 in reply to:  1 Changed 3 weeks ago by Aaron Tan

Replying to Tomer Chachamu:

The comment is definitely inaccurate, but I don't think the attribute returning None is the right answer. Sure, the FK is set to NULL on conflict, but maybe transaction B also updated Bar.objects.get(pk=<bar's pk>).pk to a new value? In this case, A will see the new value after a bar.refresh_from_db(), that is not None.

The docs could probably be improved, to mention that you can use select_related to prevent this DoesNotExist from being thrown. (Although, it doesn't remove the fact that you will insert on save, when you probably meant to update).

I see, although manually modifying primary key seems like a kinda infrequent operation, but that makes sense. In this case, can we at least capture ObjectDoesNotExist there and make a partial refresh_from_database by only refreshing and instantiating the related field in question? Since selector works on a pretty low level of Django's call stack, it might be possible to do that? After all, this means "any fetching related object for the first time is theoretically vulnerable to such concurrency issue", wrapping all of them around try...except... is cumbersome and error-prone in large and convoluted codebases.

Also, using select_related to supress DoesNotExist was our first solution, but somehow it does not work. Maybe it is worth investigating/experimenting.

Note: See TracTickets for help on using tickets.
Back to Top