Opened 18 years ago
Closed 16 years ago
#3951 closed (fixed)
Django's signal framework may register listeners more than once
Reported by: | Owned by: | Jacob | |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Keywords: | ||
Cc: | paul@…, ferringb@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Quick summary
In the current Django setup, it is possible for a signal listener to register itself with the dispatcher (PyDispatcher) more than once.
The result is that the same function can end-up being called twice when a single signal is sent.
Quick example
testproj/testapp/models.py
class A(models.Model): a = models.IntegerField(default=1) def test_function_A(): print "Called listener" dispatcher.connect(test_function_A, signal=signals.pre_init, sender=A)
Interactive session (./manage.py shell
)
>>> from django.dispatch import dispatcher >>> from django.db.models import signals >>> from testproj.testapp.models import A >>> dispatcher.getReceivers(sender=A, signal=signals.pre_init) [<function test_function_A at 0x1203fb0>] >>> from testapp.models import A >>> dispatcher.getReceivers(sender=A, signal=signals.pre_init) [<function test_function_A at 0x1203fb0>, <function test_function_A at 0x1206a30>] >>> A() Called listener Called listener <A: A object>
Deeper discussion
The problem seems to stem from the fact that Django puts itself in Python's path more than once. For example, for the project 'testproj
', Python will look in /path/to/project/
and /path/to/project/testproj/
. This, both testapp
and testproj.testapp
are valid.
Though these locations are both valid and reference the same content, they are actually treated as being different. Models seem to be treated as singletons, but other objects do not. I assume that this is because of the way in which models are loaded when Django fires-up.
>>> from testproj.testapp.models import A as full_path_A_class >>> from testapp.models import A as short_path_A_class >>> id(full_path_A_class), id(short_path_A_class) (6797536, 6797536) >>> from testproj.testapp.models import test_function_A as full_path_A_function >>> from testapp.models import test_function_A as short_path_A_function >>> id(full_path_A_function), id(short_path_A_function) (18898800, 18960624)
Note in the last line of output that two different values are returned (18898800 != 18960624). Thus, the dispatcher.connect
function has no way to determine that the listeners are the same.
The code responsible for the registration of listeners in PyDispatch can be found at django/dispatch/dispatcher.py:153-170
; however, I'm not sure that we should go hacking on the PyDispatcher code. I'm honestly not sure what the solution to this one should be.
"Disclaimers"
The following things should be noted:
- I am using
trunk
for testing; however, I've also tried manually updating to PyDispatcher 2.0.0 and the problem still exists. - All of my testing has been on Python 2.5. I assume that older versions will behave similarly.
- All of my testing has been done in the shell and with the built-in server.
- It is possible that this issue does not exist in a FCGI/mod_python deployment... I haven't tested those environments.
Change History (16)
comment:1 by , 18 years ago
comment:2 by , 18 years ago
On 04/06/07, ubernostrum wrote:
it's just as possible to trigger this with a relative import
I thought this at first, as well... and maybe you're right, but I attempted to test this case as well before 'declaring' the Python path to be the cause.
Here's a simple test case using a relative import where it doesn't happen when using the full path and where it does when using the short path.
Relative import example
testproj/testapp/views.py
from models import *
Interactive session
>>> dispatcher.getReceivers(signal=signals.pre_init, sender=A) [<function test_function_A at 0x12061b0>] >>> from testproj.testapp.views import * >>> dispatcher.getReceivers(signal=signals.pre_init, sender=A) [<function test_function_A at 0x12061b0>] >>> from testapp.views import * >>> dispatcher.getReceivers(signal=signals.pre_init, sender=A) [<function test_function_A at 0x12061b0>, <function test_function_A at 0x1206af0>]
Discussion
To me, it looks like it's the root of the import tree that matters. If something is referenced as project_name.app_name
in an import, any relative imports look like they'll work properly. However, if they're referenced as app_name
alone in imports, relative imports seem to cause duplication.
This is at the boundaries of my level of Python sophistication, though... so there might be something I'm overlooking in my analysis.
comment:3 by , 18 years ago
Triage Stage: | Unreviewed → Accepted |
---|
The simplest way to understand this problem is to realise that it is the import path (the bit after the "from" and before the "import" keywords) that is stored as a key in sys.modules. Different keys in that dictionary imply the classes are considered different by Python.
So we have to re-identify them inside the signal handler code by using external information that might be constant. That is why James is suggesting the source path identification trick we use in loading.py. You could also use things like a hash of the bytecode, but that's a little less stable.
comment:5 by , 18 years ago
We can certainly try to patch it to use the a work-around similar to that of loading.py
. It seems to me that this problem is bigger than PyDispatcher though.
My concern is that Django is patching around this in multiple locations and it's indicative of a large problem. If it was confined to Django's internals, that would be OK in my eyes... but the double-loading potentially means twice the memory, violation of singleton patterns, double-registration, etc. That's a serious problem for complex projects/applications.
What do you guys think? Is this something that's OK to keep working around, or should changing behavior be considered?
comment:6 by , 18 years ago
Well, as far as "double loading", this is technically upstream -- everything we're seeing is a logical consequence of Python's (documented) import behavior, so "changing behavior" would really mean "changing Python", which probably won't get very far ;)
We work around this in specific cases, and we may end up needing to do so for the dispatcher, but in the general case I don't think there's anything we can do, because this is how Python's import behavior is "supposed" to work.
comment:7 by , 18 years ago
I understand that this is standard Python behavior... it's just that Django encourages a setup where this duplication effect is more likely to happen. That is, putting the same content in Python's path twice.
My conjecture regarding changing behavior was more along the lines of changing user behavior, not that of Python (encouraging a design pattern where only project-level references would be used, for example... though this goes against application portability goals). It's a tricky issue, and it's certainly not the fault of Django or its developers... its just something that we should probably try to make sure its a problem that users know to avoid.
A work-around for the dispatcher does seem to be in order in the short term. It looks like upstream for PyDispatcher has been abandoned, so I guess we're on our own.
If there's a particular approach you're interested in pursuing for the dispatcher, let me know and I can take a crack at coming up with a patch.
comment:8 by , 18 years ago
Ben, it's standard Python behaviour, so people rapidly become used to it: if you are referencing things by id() or name, use the same import path or find some other way to identify them. This has to be taken into account when pickling as well.
It's not worth making a big deal out of for Django users. We work around it (by using another identification method) in the rare cases we need it to ease the user's burden. You didn't even know it existed in loading.py, showing how little of a burden it puts on your use. We'll do the same thing for signal matching, most likely. It's not something that's going to change very much in upstream Python -- certainly not in Python 2.x, so there's no benefit in looking for any bigger picture. This is just the way life is.
comment:9 by , 18 years ago
Cc: | added |
---|
comment:10 by , 18 years ago
Cc: | added |
---|
comment:11 by , 17 years ago
I have just been burned by this, and as a novice django user but a fairly experienced python programmer, I take serious issue with the "people rapidly become used to it" attitude. In fact, the comment by mtredinnick on 4/7 above is more infuriating every time I read it.
In my case, I had the signal registration code in a module along with the model to which the signal was attached. That module was being loaded (in development/"runserver" mode) twice: once apparently for validation and once when the module was loaded by the http subsystem. Finding this bug report at least pointed me in the right direction, and I've "fixed" the problem by moving the registration code elsewhere.
This is clearly not a problem with a trivial fix, but I guarantee that people will get hung up on it. The signals architecture, something which could otherwise be a significant win, is far less useful subject to this sort of hidden breakage.
comment:12 by , 16 years ago
The dispatch_uid
argument in the refactored signals (see #6814) will fix this.
comment:13 by , 16 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:14 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
(In [8223]) Major refactoring of django.dispatch with an eye towards speed. The net result is that signals are up to 90% faster.
Though some attempts and backwards-compatibility were made, speed trumped compatibility. Thus, as usual, check BackwardsIncompatibleChanges for the complete list of backwards-incompatible changes.
Thanks to Jeremy Dunck and Keith Busell for the bulk of the work; some ideas from Brian Herring's previous work (refs #4561) were incorporated.
Documentation is, sigh, still forthcoming.
Fixes #6814 and #3951 (with the new dispatch_uid argument to connect).
comment:15 by , 16 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
The Python path stuff done by
manage.py
is a bit of a red herring in this case; it's just as possible to trigger this with a relative import. The problem stems from the fact that in the second import statement in your example, Python looks for the keytestapp.models
insys.modules
and doesn't find it (because the first import added an entry fortestproj.testapp.models
), so the module gets initialized again.We hack around this in the model loading system because it's necessary to insure only one copy of each model class, and do it by checking the source filename of the module, which is a bit more reliable (though not completely so -- you can still trip it up with symlinks if you really want to).
I'm also unsure of the best way to get around this -- hacking in PyDispatcher probably won't be that fun.