Opened 18 years ago

Closed 15 years ago

#3951 closed (fixed)

Django's signal framework may register listeners more than once

Reported by: Ben Slavin <benjamin.slavin@…> Owned by: Jacob
Component: Core (Other) Version: dev
Severity: Keywords:
Cc: paul@…, ferringb@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Quick summary

In the current Django setup, it is possible for a signal listener to register itself with the dispatcher (PyDispatcher) more than once.

The result is that the same function can end-up being called twice when a single signal is sent.

Quick example

testproj/testapp/models.py

class A(models.Model):
    a = models.IntegerField(default=1)
def test_function_A(): print "Called listener"
dispatcher.connect(test_function_A, signal=signals.pre_init, sender=A)

Interactive session (./manage.py shell)

>>> from django.dispatch import dispatcher
>>> from django.db.models import signals

>>> from testproj.testapp.models import A
>>> dispatcher.getReceivers(sender=A, signal=signals.pre_init)
[<function test_function_A at 0x1203fb0>]

>>> from testapp.models import A    
>>> dispatcher.getReceivers(sender=A, signal=signals.pre_init)
[<function test_function_A at 0x1203fb0>, <function test_function_A at 0x1206a30>]

>>> A()
Called listener
Called listener
<A: A object>

Deeper discussion

The problem seems to stem from the fact that Django puts itself in Python's path more than once. For example, for the project 'testproj', Python will look in /path/to/project/ and /path/to/project/testproj/. This, both testapp and testproj.testapp are valid.

Though these locations are both valid and reference the same content, they are actually treated as being different. Models seem to be treated as singletons, but other objects do not. I assume that this is because of the way in which models are loaded when Django fires-up.

>>> from testproj.testapp.models import A as full_path_A_class
>>> from testapp.models import A as short_path_A_class

>>> id(full_path_A_class), id(short_path_A_class)
(6797536, 6797536)

>>> from testproj.testapp.models import test_function_A as full_path_A_function
>>> from testapp.models import test_function_A as short_path_A_function

>>> id(full_path_A_function), id(short_path_A_function)
(18898800, 18960624)

Note in the last line of output that two different values are returned (18898800 != 18960624). Thus, the dispatcher.connect function has no way to determine that the listeners are the same.

The code responsible for the registration of listeners in PyDispatch can be found at django/dispatch/dispatcher.py:153-170; however, I'm not sure that we should go hacking on the PyDispatcher code. I'm honestly not sure what the solution to this one should be.

"Disclaimers"

The following things should be noted:

  • I am using trunk for testing; however, I've also tried manually updating to PyDispatcher 2.0.0 and the problem still exists.
  • All of my testing has been on Python 2.5. I assume that older versions will behave similarly.
  • All of my testing has been done in the shell and with the built-in server.
  • It is possible that this issue does not exist in a FCGI/mod_python deployment... I haven't tested those environments.

Change History (16)

comment:1 by James Bennett, 18 years ago

The Python path stuff done by manage.py is a bit of a red herring in this case; it's just as possible to trigger this with a relative import. The problem stems from the fact that in the second import statement in your example, Python looks for the key testapp.models in sys.modules and doesn't find it (because the first import added an entry for testproj.testapp.models), so the module gets initialized again.

We hack around this in the model loading system because it's necessary to insure only one copy of each model class, and do it by checking the source filename of the module, which is a bit more reliable (though not completely so -- you can still trip it up with symlinks if you really want to).

I'm also unsure of the best way to get around this -- hacking in PyDispatcher probably won't be that fun.

comment:2 by Ben Slavin <benjamin.slavin@…>, 18 years ago

On 04/06/07, ubernostrum wrote:

it's just as possible to trigger this with a relative import

I thought this at first, as well... and maybe you're right, but I attempted to test this case as well before 'declaring' the Python path to be the cause.

Here's a simple test case using a relative import where it doesn't happen when using the full path and where it does when using the short path.

Relative import example

testproj/testapp/views.py

from models import *

Interactive session

>>> dispatcher.getReceivers(signal=signals.pre_init, sender=A)
[<function test_function_A at 0x12061b0>]

>>> from testproj.testapp.views import *
>>> dispatcher.getReceivers(signal=signals.pre_init, sender=A)
[<function test_function_A at 0x12061b0>]

>>> from testapp.views import *
>>> dispatcher.getReceivers(signal=signals.pre_init, sender=A)
[<function test_function_A at 0x12061b0>, <function test_function_A at 0x1206af0>]

Discussion

To me, it looks like it's the root of the import tree that matters. If something is referenced as project_name.app_name in an import, any relative imports look like they'll work properly. However, if they're referenced as app_name alone in imports, relative imports seem to cause duplication.

This is at the boundaries of my level of Python sophistication, though... so there might be something I'm overlooking in my analysis.

comment:3 by anonymous, 18 years ago

Triage Stage: UnreviewedAccepted

The simplest way to understand this problem is to realise that it is the import path (the bit after the "from" and before the "import" keywords) that is stored as a key in sys.modules. Different keys in that dictionary imply the classes are considered different by Python.

So we have to re-identify them inside the signal handler code by using external information that might be constant. That is why James is suggesting the source path identification trick we use in loading.py. You could also use things like a hash of the bytecode, but that's a little less stable.

comment:4 by Malcolm Tredinnick, 18 years ago

(Last comment was by me.)

comment:5 by Ben Slavin <benjamin.slavin@…>, 18 years ago

We can certainly try to patch it to use the a work-around similar to that of loading.py. It seems to me that this problem is bigger than PyDispatcher though.

My concern is that Django is patching around this in multiple locations and it's indicative of a large problem. If it was confined to Django's internals, that would be OK in my eyes... but the double-loading potentially means twice the memory, violation of singleton patterns, double-registration, etc. That's a serious problem for complex projects/applications.

What do you guys think? Is this something that's OK to keep working around, or should changing behavior be considered?

comment:6 by James Bennett, 18 years ago

Well, as far as "double loading", this is technically upstream -- everything we're seeing is a logical consequence of Python's (documented) import behavior, so "changing behavior" would really mean "changing Python", which probably won't get very far ;)

We work around this in specific cases, and we may end up needing to do so for the dispatcher, but in the general case I don't think there's anything we can do, because this is how Python's import behavior is "supposed" to work.

comment:7 by Ben Slavin <benjamin.slavin@…>, 18 years ago

I understand that this is standard Python behavior... it's just that Django encourages a setup where this duplication effect is more likely to happen. That is, putting the same content in Python's path twice.

My conjecture regarding changing behavior was more along the lines of changing user behavior, not that of Python (encouraging a design pattern where only project-level references would be used, for example... though this goes against application portability goals). It's a tricky issue, and it's certainly not the fault of Django or its developers... its just something that we should probably try to make sure its a problem that users know to avoid.

A work-around for the dispatcher does seem to be in order in the short term. It looks like upstream for PyDispatcher has been abandoned, so I guess we're on our own.

If there's a particular approach you're interested in pursuing for the dispatcher, let me know and I can take a crack at coming up with a patch.

comment:8 by Malcolm Tredinnick, 18 years ago

Ben, it's standard Python behaviour, so people rapidly become used to it: if you are referencing things by id() or name, use the same import path or find some other way to identify them. This has to be taken into account when pickling as well.

It's not worth making a big deal out of for Django users. We work around it (by using another identification method) in the rare cases we need it to ease the user's burden. You didn't even know it existed in loading.py, showing how little of a burden it puts on your use. We'll do the same thing for signal matching, most likely. It's not something that's going to change very much in upstream Python -- certainly not in Python 2.x, so there's no benefit in looking for any bigger picture. This is just the way life is.

comment:9 by Paul Collier <paul@…>, 18 years ago

Cc: paul@… added

comment:10 by (removed), 17 years ago

Cc: ferringb@… added

comment:11 by anonymous, 16 years ago

I have just been burned by this, and as a novice django user but a fairly experienced python programmer, I take serious issue with the "people rapidly become used to it" attitude. In fact, the comment by mtredinnick on 4/7 above is more infuriating every time I read it.

In my case, I had the signal registration code in a module along with the model to which the signal was attached. That module was being loaded (in development/"runserver" mode) twice: once apparently for validation and once when the module was loaded by the http subsystem. Finding this bug report at least pointed me in the right direction, and I've "fixed" the problem by moving the registration code elsewhere.

This is clearly not a problem with a trivial fix, but I guarantee that people will get hung up on it. The signals architecture, something which could otherwise be a significant win, is far less useful subject to this sort of hidden breakage.

comment:12 by Jacob, 16 years ago

The dispatch_uid argument in the refactored signals (see #6814) will fix this.

comment:13 by Jacob, 16 years ago

Owner: changed from nobody to Jacob
Status: newassigned

comment:14 by Jacob, 16 years ago

Resolution: fixed
Status: assignedclosed

(In [8223]) Major refactoring of django.dispatch with an eye towards speed. The net result is that signals are up to 90% faster.

Though some attempts and backwards-compatibility were made, speed trumped compatibility. Thus, as usual, check BackwardsIncompatibleChanges for the complete list of backwards-incompatible changes.

Thanks to Jeremy Dunck and Keith Busell for the bulk of the work; some ideas from Brian Herring's previous work (refs #4561) were incorporated.

Documentation is, sigh, still forthcoming.

Fixes #6814 and #3951 (with the new dispatch_uid argument to connect).

comment:15 by tagshow, 15 years ago

Resolution: fixed
Status: closedreopened

comment:16 by Russell Keith-Magee, 15 years ago

Resolution: fixed
Status: reopenedclosed

Reverting spam.

Note: See TracTickets for help on using tickets.
Back to Top