Opened 4 years ago

Closed 2 years ago

#16679 closed Cleanup/optimization (fixed)

Speed up signals by caching the reveicers per sender

Reported by: akaariai Owned by: nobody
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords:
Cc: anssi.kaariainen@…, jdunck@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


This is related to ticket #16639, where I tried to speed up db.models.Model __init__ by some kludges in pre_init and post_init signals. Here is another try, this time with a more generic approach.

I have two patches, the first one speeds up just the cases where there are no receivers for the current sender. This is done by caching "no receivers" when send was called and the sender had no receivers. The cache is cleared whenever connect or disconnect is called or whenever a weakref signal is removed by garbage collection. Performance is roughly the same as trunk when there are receivers for the current sender. In the case when there are receivers, but not for the current sender, the performance is much better, as in almost no overhead.

The second patch optimizes things a bit further by not only remembering which senders have receivers, but by also remembering the list of receivers. This will require more memory, but I believe this optimization might be needed for further work (for example deferred models do not currently send signals correctly, and correcting this might mean more work in signal.send()).

There are some problems:

  1. The sender object must be hashable. In the trunk code only __eq__ is assumed.
  2. The caches can leak memory. For example the template_rendered signal has that property.
  3. Code complexity.
  4. More work if signals change constantly.

The first two problems can be solved by adding a kwarg use_caching to Signal.__init__ (default False). There are some signals that will benefit from caching (model signals at least) and some that would leak memory too much with caching (template_rendered, notably). The use_cache kwarg is included in the second patch.

The rough numbers for pre and post second patch for 10000 simple model __init__ (only id) is:
pre patch:

  • no signals at all: 108ms
  • one pre_init signal to other model: 0.171ms
  • one pre_init and one post_init signal to other model: 0.220ms
  • one pre_init signal to initialized model: 0.215ms

post patch 2:

  • no signals at all: 103ms
  • one pre_init signal to other model: 0.115ms
  • one pre_init and one post_init signal to other model: 0.115ms
  • one pre_init signal to initialized model: 0.173ms

In the case when there is one post_init and one pre_init signal in the project, one can save around 50% in the best case.

The numbers are nice, but IMHO the real benefit is that with the second patch, having inherited model signals would be possible without paying a huge performance penalty. Fixing deferred objects not firing signals would also be easier without performance problems.

The patches are still somewhat WIP. But I'd like to open discussion before I polish them and test them more.

Attachments (6)

fast-signals1.diff (5.4 KB) - added by akaariai 4 years ago.
Without sender -> receivers cache
fast-signals2.diff (8.5 KB) - added by akaariai 4 years ago.
sender -> receivers cache (9.4 KB) - added by Suor 3 years ago.
no_cache_fast_signals.diff (6.4 KB) - added by Suor 3 years ago.
no_cache_fast_signals.2.diff (6.3 KB) - added by Suor 3 years ago.
no_cache_fast_signals.3.diff (6.6 KB) - added by Suor 3 years ago.

Download all attachments as: .zip

Change History (23)

Changed 4 years ago by akaariai

Without sender -> receivers cache

Changed 4 years ago by akaariai

sender -> receivers cache

comment:1 Changed 4 years ago by aaugustin

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

This clearly needs a review by a core developer before going into trunk => marking as DDN.

comment:2 Changed 3 years ago by jacob

  • Triage Stage changed from Design decision needed to Accepted

This seems like a good idea to me - marking accepted.

comment:3 Changed 3 years ago by Suor

Here are another approach, without cache, store receivers by sender instead.

Changed 3 years ago by Suor

Changed 3 years ago by Suor

Changed 3 years ago by Suor

comment:4 Changed 3 years ago by akaariai

I quickly tried your patch, but didn't see performance improvement in this case:

class A(models.Model):

class B(models.Model):

def recv1(*args, **kwargs):

signals.pre_init.connect(recv1, sender=B, weak=False)
signals.post_init.connect(recv1, sender=B, weak=False)

for i in range(0, 1000):

Note that the connected signals for B are slowing down A __init__. This is the case where the cached signals (fast-signals2.diff) give the best performance increase.

comment:5 Changed 3 years ago by Suor

My patch is designed for large amount of signals. Adding some special casing for empty receivers should make it almost as fast as cached signals without actually caching anything. Still storing receivers in proper structure seems cleaner approach than caching.

To make signals even faster we should drop weakrefs, they seem more confusing than useful.

Changed 3 years ago by Suor

comment:6 Changed 3 years ago by Suor

Supplied updated patch which should be faster for no signals case

comment:7 Changed 3 years ago by akaariai

As I see it, your approach is the right one, at least as long as we don't want to support inherited signals. There are some problems when trying to make inherited signals work, though.

The problem for inherited signals is that at the time the signal is registered, we don't necessarily know all of the sender's subclasses. And, when a model is created by the apploading code, we don't know the signals yet. So, we would need to update the signal datastructures when a new signal is connected, disconnected or a new subclass is registered in the app cache. In the two first cases, we should check all the subclasses of the sender, and in the last case we would need to check the parents of the sender.

My approach to that problem is that when a signal is sent, we cache the found results. When a receiver is connected or disconnected we flush all the caches. This would make it perhaps easier to support inherited signals, as there is just one point when we need to check the parents and cache the results based on that. On the other hand, my version is very bad if you do dynamically connect / disconnect signals, as the caches would get flushed all the time and the caching would just increase the overhead, not reduce it. I don't know if dynamical connect is a common use case.

Of course, there is no decision if we actually do want to support inherited signals. They would be useful, as one would usually want to have BaseModel pre_save/post_save sent when InheritedModel is saved.

comment:8 Changed 3 years ago by Suor

Yes, my solution is not suitable for inherited signals. And once we have those it should be discarded. But do we really need them? Implementing as little as possible is generally a good idea.

And, about dynamic creation of model subclasses, I do pretty much of it. And in my experience, signals are usually connected in .contribute_to_class() call which is done for every new model class, thus no need for inherited signals here.

comment:9 Changed 3 years ago by akaariai

  • Patch needs improvement set

Actually now that I think about it, dynamic creation of subclasses will not be a problem for the caching approach. The memory used by the signals cache will be much less than the memory used by the app module cache anyways. For the caching approach the only real killer is dynamic addition / removal of signals. But I think that could be improved somewhat easily.

And I claim that inherited signals would be useful. An external application like Haystack could use them, see this for example: Now, if you save a inherited model, it is not going to do any indexing. They would surely want inherit=True. However if a decision needs to be made between fast __init__ signals and inherited signals, it is the model initialization that is important.

But lets concentrate here on making the signals as fast as possible and nothing else. Even if it is decided that we want inherited signals and thus caching, it would still be a good idea to have proper data structures for signal storage.

The patch seems to fail signals / signals_regress test suites. Some high level comments about what is going on with the datastructures would be useful.

I assume the worst case for your patch, where you have a connected receiver for both sender=None and sender=CurrentSender isn't a common case?

comment:10 Changed 3 years ago by Suor

About failures: it's tests that wrong here, they rely on .receivers property being list of receivers while I use more complex structure. Sure adjusted tests should be added to my patch, I was just giving a try at a moment.

About worst case: Yeah, the worst case is than combine() is needed. It still has linear complexity, not so bad.

About caching: in fast-signals2.diff you use both sender_receivers_cache and sender_no_receivers_cache. The latter seems superfluous you can just store [] in the former, and then use something like:

if not self.receivers:
    return []
if sender in self.sender_receivers_cache:
    receivers = self.receivers_cache[sender]
    receivers = self.receivers_cache[sender] = self._receivers_for_sender(sender)

return deref(receivers)

I also dropped .use_caching property. What can be a reason not to? Just cache always since it don't change any behavior.

comment:11 Changed 3 years ago by Suor

As I understand haystack it creates SearchIndex for each model and that search index connects signals to receivers that depend on that search index, so no use for inherited signals here too. Or maybe I just don't see a big simplifying refactoring here which involves inherited signals?

Last edited 3 years ago by Suor (previous) (diff)

comment:12 Changed 3 years ago by akaariai

The senders_no_receivers cache is there to make the sending marginally faster. But I think if you did on the last line this:

return receivers and deref(receivers) or []

I would be actually as fast as the senders_no_receivers cache version, as you would get rid of that function call. Other than that your code does look much better.

The problem with not having .use_caching property is that if somebody decides to use dynamic classes with the signaling system, you will get really bad memory leak - every dynamic class will be stored in the cache. This is not a problem with model classes, because you will leak the classes anyways in app models cache.

So, what do you think, which way should we go here, caching, your version or both?

The Haystack use case is following:

class NoteIndex(RealtimeSearchIndex):
    # Index definition
site.register(Note, NoteIndex)

Because you use RealtimeSearchIndex, haystack will connect to the signals, and do automatic updates of the index in the signals. Now, if you have SpecialNote(Note) class, and you save a SpecialNote, you will NOT get the index updated, even if you saved a note. Even a ProxyNote(Note) would break your indexing. There are surely other use cases like that. For example you could have an extension "django-audits", where you register an audit_trail for a model, and it will automatically do auditing for you. Now, if you use subclassing (even proxy classes), suddenly your auditing is not there. See the problem?

Although there are other problems for signals, too: they are not sent when you do qs.update, nor are they sent when you do bulk_create. I have some ideas of how there could be fixed, although that would mean first fetching the to-be-updated resultset in qs.update() and then doing the update in one query. So it would mean that if you update a gazillion rows in one go, you would get a LOT worse performance... But having data modification signals that are only sometimes sent will lead to problems for users who think they can audit their data by just signals. Ah well, getting out of this ticket's topic again... :)

comment:13 Changed 3 years ago by Suor

Are there any real world scenario for using numerous thrown out dynamic classes? And while they are thrown out they still live long enough to use signals. Probably the right approach here will be just ignore this case.

About inherited signals. I agree that they could simplify some code (making it more implicit by the way). It will also make signals implementation slower and more complex. It seems that somebody should start another ticket or topic on django-developers to resolve this question. Who knows? Maybe a lot of people want this?

About implementation. It appears to me that my approach is cleaner when sender inheritance is not involved, otherwise one should just go with caching. So this solution should be delayed until we get any decision on previous question.

comment:14 Changed 3 years ago by jdunck

  • Cc jdunck@… added

comment:15 Changed 3 years ago by akaariai

A rebased signals caching patch is available from here.

There is partially overlapping work going on in ticket #18627.

comment:16 Changed 2 years ago by akaariai

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

A rebased and slightly simplified version at

model_init_signals (pre and post init signals defined for model other than initialized, one field for the initialized model):

Running 'model_init_signals' benchmark ...
Min: 0.012126 -> 0.007745: 1.5656x faster
Avg: 0.012178 -> 0.007774: 1.5666x faster
Significant (t=644.833741)
Stddev: 0.00004 -> 0.00002: 2.3318x smaller (N = 34)

Benchmark available at:
(Note that the djangobench used is somewhat hacked version - it discards 2/3 of the results - the highest third and the lowest third - to get consistent results on my laptop).

model_init_self_signals (pre and post init for same model):

Running 'model_init_self_signals' benchmark ...
Min: 0.014637 -> 0.011735: 1.2473x faster
Avg: 0.014675 -> 0.011762: 1.2477x faster
Significant (t=530.411861)
Stddev: 0.00002 -> 0.00002: 1.2429x smaller (N = 34)

query_all (no signals at all, 1000 objects from DB)

Running 'query_all' benchmark ...
Min: 0.006779 -> 0.006783: 1.0006x slower
Avg: 0.006819 -> 0.006825: 1.0010x slower
Not significant
Stddev: 0.00003 -> 0.00003: 1.1693x larger (N = 34)

All tests pass and I don't see any reason to not add the cache. So, will commit tonight if no objections.

comment:17 Changed 2 years ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 704ee33f503c96b96c2682b946a11b3b42318ba7:

Fixed #16679 -- Use caching to speed up signal sending

Note: See TracTickets for help on using tickets.
Back to Top