Opened 4 months ago

Last modified 5 days ago

#35801 new Bug

Signals are dispatched to receivers associated with dead senders

Reported by: bobince Owned by:
Component: Core (Other) Version: 5.1
Severity: Normal Keywords:
Cc: bobince, Simon Charette, Lincoln Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

django.dispatch.Signal.receivers is keyed on the id() of the receiver and sender (modulo a thing to make bound methods consistent).

If a sender connected to a signal is destroyed, and a new object is allocated with the same id() and then also connected as a sender to the signal, when the signal fires it will match the original sender and call the receiver that was connected for that sender (as well as the new one).

Signal works around the problem of re-used ids for the receiver by having a weakref to the receiver (since it needs a reference anyway to be able to call it), but it doesn't for sender.

In my case this resulted in post-migrate hooks for the wrong apps being occasionally called in migration tests that mutated INSTALLED_APPS causing AppConfig senders to be re-created, but it can be more simply provoked with:

from django.dispatch import Signal

receivers_called = []
def create_receiver(i):
    def receiver(**kwargs):
        receivers_called.append(i)
    return receiver

n = 100
receivers = [create_receiver(i) for i in range(n)]

signal = Signal()
for i in range(n):
    sender = object()
    signal.connect(receivers[i], sender=sender)
    receivers_called = []
    _ = signal.send(sender=sender)
    # only the receiver for the new sender object should be called
    assert receivers_called == [i], f'Expected [{i}], called {receivers_called}'

(how readily this explodes may depend on local memory allocation differences, but it dies pretty consistently for me on iteratio
n 3.)

Perhaps Signal should be keeping a weakref to each sender as well as receiver, and detecting when it has gone None? Does this need to participate in the _dead_receivers mechanism?

Attachments (1)

minimal.tar.gz (6.3 KB ) - added by Sjoerd Job Postmus 2 months ago.

Download all attachments as: .zip

Change History (9)

comment:1 by Natalia Bidart, 4 months ago

Resolution: needsinfo
Status: newclosed

Hello bobince, thank you for taking the time to create this ticket report.

I have investigated and I have used the code that you shared to reproduce. I applied a small change because I don't think it's correct to clear receivers_called = [] like that inside the loop. I also added some kwargs when sending the Signal to ensure we are debugging the right thing. What I find more "correct" is the following, which unless I'm missing something should be equivalent to your proposal. Sadly, this does not fail for me at any iteration (I even increased n to be 1000):

from collections import defaultdict

from django.dispatch import Signal


signal_calls = defaultdict(list)


def create_receiver(i):
    def receiver(sender, **kwargs):
        signal_calls[i].append((sender, kwargs))

    return receiver


n = 1000
receivers = [create_receiver(i) for i in range(n)]
signal = Signal()
for i in range(n):
    sender = object()
    signal.connect(receivers[i], sender=sender)
    _ = signal.send(sender=sender, i=i)
    # only the receiver for the new sender object should be called
    expected = [(sender, {"signal": signal, "i": i})]
    assert signal_calls[i] == expected, f"{expected=}, got {signal_calls=}"

Do you think you could, instead of a script, provide a failing test case for the Django test suite to illustrate the issue you are describing?

comment:2 by bobince, 4 months ago

I don't think it's correct to clear receivers_called = [] like that inside the loop

It should be fine as long as the loop is in the same scope as the definition of receivers_called. But yes, it might be clearer to reset the list without rebinding it by doing del receivers_called[:].

Sadly, this does not fail for me at any iteration

That's because the new example stores a reference to sender in signal_calls, which prevents the sender being destroyed and its id re-used.

If you store id(sender) or str(sender) instead you'll get the error again.

in reply to:  2 comment:3 by Natalia Bidart, 4 months ago

Cc: Simon Charette added
Resolution: needsinfo
Status: closednew
Triage Stage: UnreviewedAccepted

Replying to bobince:

I don't think it's correct to clear receivers_called = [] like that inside the loop

It should be fine as long as the loop is in the same scope as the definition of receivers_called. But yes, it might be clearer to reset the list without rebinding it by doing del receivers_called[:].

Right, I would suggest receivers_called.clear()

Sadly, this does not fail for me at any iteration

That's because the new example stores a reference to sender in signal_calls, which prevents the sender being destroyed and its id re-used.

If you store id(sender) or str(sender) instead you'll get the error again.

Great catch, that makes perfect sense. I tweaked my script and was able to confirm your report. I'll accept this ticket on that basis, I couldn't find a previous report about this.

Would you like to prepare a patch?

(Adding Simon as cc since he might have some good ideas for a robust fix.)

comment:4 by bobince, 4 months ago

Right, I would suggest receivers_called.clear()

(Ha ha, what is this crazy new-fangled API? Oh... Python 3.3 you say. well. huh!)

Would you like to prepare a patch?

Here's a candidate fix: https://github.com/bobince/django/compare/main..signals_from_dead_senders

It raises some questions about what to do when ‘sender’ isn't weakreffable — which admittedly is uncommon, and would already break if ‘use_caching’, but there's no doc that rules it out as far as I can see. In this commit it quietly falls back to making a strong reference, which would _probably_ be better than the buggy behaviour, but is still a change in behaviour.

So would certainly appreciate thoughts from someone familiar with the background of dispatch before PR'ing.

comment:5 by Simon Charette, 4 months ago

In my case this resulted in post-migrate hooks for the wrong apps being occasionally called in migration tests that mutated INSTALLED_APPS causing AppConfig senders to be re-created

Could you provide a minimal project demonstrating the issue in a non-synthetic manner? Before we commit to a solution (which yours appear to be correct from a quick look) we'd want to establish under which organic conditions the problem can be reproduced.

The Django test suite has many instances of @override_settings(INSTALLED_APPS) and doesn't run into this issue and I'm not sure it's fair to expect Django to behave properly under memory pressure circumstances that make id and therefore hash (as the default implementation of __hash__ is id() based) to return the same value for two distinct objects.

I think a possible alternative solution could be to identify under which organic circumstances this problem is likely to happen, document that signal connected with a sender must be disconnected prior to the object being garbage collected, and adjust the infringing cases within the framework itself.

by Sjoerd Job Postmus, 2 months ago

Attachment: minimal.tar.gz added

comment:6 by Sjoerd Job Postmus, 2 months ago

I found this ticket while investigating a problem we are having in our CI, where we have recurring failures when running python manage.py migrate.

In our set up, we have a lot of database migrations. Some of our models inherit from the DirtyFieldsMixin provided by django-dirtyfields.

In DirtyFieldsMixin.__init__, it calls post_save.connect, with sender=self.__class__. (Code: https://github.com/romgar/django-dirtyfields/blob/f48bbe975c627f3f7577f018992a44b1b1cbad8d/src/dirtyfields/dirtyfields.py#L32).

What we're seeing during the migrations, is that reset_state is called with an instance of a model that does *not* inherit from DirtyFieldsMixin. Likely because the class definition is now in the same area of memory as which originally contained the temporary model class of a model that does inherit from DirtyFieldsMixin.

Trimming that down to a 'minimal project demonstrating the issue in a non-synthetic manner' is however hard, because as you said Simon, it has to do also with memory pressure, and it's hard to simulate the memory pressure of a large project while simultaneously exhibiting a minimal example.

I did manage to trim down the workings to a minimal example:

import random

from django.db import migrations


def create_instance_withoutdirty_instance(apps, schema_editor):
    WithoutDirty = apps.get_model("stress", "WithoutDirty")
    WithoutDirty.objects.create()


def clear_apps_cache(apps, schema_editor):
    # Whenever something schema-related changes, the apps-cache gets cleared. To keep
    # this test minimal, we clear the cache manually instead of changing the schema.
    apps.clear_cache()


def init_withdirty_instance(apps, schema_editor):
    WithDirty = apps.get_model("stress", "WithDirty")
    print(f"WithDirty has ID {id(WithDirty)}")
    # Call __init__, don't even bother doing more.
    WithDirty()


def update_withoutdirty_instance(apps, schema_editor):
    WithoutDirty = apps.get_model("stress", "WithoutDirty")
    print(f"WithoutDirty has ID {id(WithoutDirty)}")
    instance = WithoutDirty.objects.get()
    instance.save()


class Migration(migrations.Migration):

    dependencies = [
        ('stress', '0001_initial'),
    ]

    operations = [
        migrations.RunPython(create_instance_withoutdirty_instance),
        *[
            migrations.RunPython(
                random.choice(
                    [
                        clear_apps_cache,
                        init_withdirty_instance,
                        update_withoutdirty_instance,
                    ]
                )
            )
            for _ in range(1000)
        ]
    ]

(see also attached project)

The example works by randomly deciding from three options:

  • clear the apps cache (as if a model was added/altered/deleted)
  • Call WithDirty(), triggering the .connect(...) call.
  • Call WithoutDirty.objects.get().save(), triggering post_save to notify all its receivers.

When it fails, this is because the temporary WithoutDirty class is instantiated in the same location of memory as which previously held a temporary WithDirty class, causing reset_state to be called with the WithoutDirty instance.

During testing, occasionally it would not trigger the bug, but then I'd just delete the db.sqlite3 file and run again.

(I also logged this bug under django-dirtyfields, https://github.com/romgar/django-dirtyfields/issues/233, as it might be realistic to state that the case with the migrations might need to be solved there instead. However, I'm not sure if it's possible to fix this in django-dirtyfields without changes in Django).

Last edited 2 months ago by Sjoerd Job Postmus (previous) (diff)

comment:7 by Simon Charette, 3 weeks ago

Has patch: set
Patch needs improvement: set

Thanks for all the extra details Sjoerd, this is greatly appreciated.

Here's how I think we can resume the problem in a few bullet points

  1. Signal doesn't provide a way to connect to non-weak senders
  2. Signal.receivers doesn't hold a strong reference to senders as it uses id to retrieve them by lookup and under memory pressure id can return the same value for two distinct objects
  3. The migration framework creates and disposes of ModelBase instances (aka Model subclasses) which creates a lot of memory pressure (see #29898) and cycles through sender for all model signals which makes the perfect conditions for id(SomeFakeModel) == id(SomeOtherFakeModel) collisions.

With all that said I can see a few solutions which should be considered.

First making Signal.connect(weak=True) also use weakref for senders like bobince implemented seems like a good step forward. If you could open a PR from your branch that also tests Signal(use_caching=True) (which should work as it already uses a WeakKeyDictionary for its sender cache) we should be land this work independently of the following proposed solutions. I believe we should have weakref.ref(sender) raise a TypeError instead of silencing though as the proposed code now holds a strong reference to sender in receivers which could make things much worst but this can be discussed further during code review.

That should leave us with what should be done with signal receivers connected with weak=False. In the case of django-dirty-fields's usage Sjoerd I think they should switch to weak=True as their reset_state receiver is defined at the module level anyway so it's causing more harm than good. There are albeit rare use cases for weak=False and it's a public API so I wonder if the migration framework should do a better job at clearing after itself by implementing __del__ logic for AppConfig and fake models for the core signals at least.

comment:8 by Lincoln, 5 days ago

Cc: Lincoln added
Note: See TracTickets for help on using tickets.
Back to Top