Opened 4 years ago

Closed 4 years ago

#18627 closed Cleanup/optimization (wontfix)

Remove django-core uses of Model pre_init and post_init

Reported by: Jeremy Dunck Owned by: Jeremy Dunck
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


GenericForeignKey and ImageField use pre_init and post_init - signals which are fired on each model construction.

The signal handling code is heavy relative to plain function dispatch, and is optimized for the zero-handler case. It's unfortunate that Django ships with these signals being used, as it could instead use an internal hook for the needed initialization, and this approach would be faster.

I'm working on a patch to switch to a more direct hook (which any field could use instead of pre/post_init).

See also:!search/pre_init$2Fpost_init$20vs.$20performance/django-developers/GNQsvs8X9Gc/cegjyELUZt8J

(Marking triage accepted since discussed and +1'd by some core folks.)

Change History (8)

comment:1 Changed 4 years ago by Jeremy Dunck

Has patch: set

comment:2 Changed 4 years ago by Jeremy Dunck

Owner: changed from nobody to Jeremy Dunck
Status: newassigned

comment:3 Changed 4 years ago by Aymeric Augustin

A quick summary of the context:

  • GenericForeignKey uses pre_init since day 1, see [bca5327b21]. This isn't rock-solid — for instance, setting a GFK to an unsaved model instance doesn't work, even if you save the related model instance first (there was a ticket about this recently).
  • ImageField uses post_init since #11196, see [d89ba464dd]. That was three years ago.

The proposed patch re-implements a signal-like pattern, but specific to model fields. Through their contribute_to_class methods, fields can register additional pre_init or post_init logic. Unlike the global pre_init and post_init signals, this stays within each model's boundaries, hence the speed optimization.

I'm moderately enthusiastic about this idea. Speeding-up model creation is a worthy goal, but aren't we trading a short term benefit for more technical debt? With this change, there will be two competing pre/post_init mechanisms for models. Django (both core and contrib) will user the faster one.

Do you plan to turn this into a public, documented API? The model API is a foggy area: the docs encourage people to write their own fields (and we've repeatedly waved off JsonField and UuidField for this reason) but they don't describe very precisely what is kosher and what isn't. If we introduce this mechanism, people will use it in their own fields. So we're making this area even more difficult to clean up.

Have you tried to implement the parts of GenericForeignKey and ImageField without relying on a model-level hook? Basically, the problem is that a field must interact with other fields of the same model. It may be possible to handle this within the field itself. I haven't said it's easy, but if it was possible, it would be much cleaner, and probably safer in the long term.

That's a -0; it could be turned into a +0 if alternative solutions prove to be worse.

Last edited 4 years ago by Aymeric Augustin (previous) (diff)

comment:4 Changed 4 years ago by Anssi Kääriäinen

If this would be easy to do in the field itself that would be nice. Unfortunately any approach where a method is called for each field on model init is immediately a no-go, as the performance hit is substantial.

What is done in the ticket is basically caching the information which fields has a method that do something.

If there is a way to do field on-init calls with properties or something similar, then that would be neat. Still, if I understand correctly one can't do the post_init stuff at assignment time.

It is true the current approach is a little ugly. One the other hand model __init__ speed is important.

With carefully crafted signal receiver caching we can actually get same performance (if not even little better) than given by this idea, but I haven't seen too much interest in #16679 from other core devs... In addition, #16679 or something similar is needed for efficient inherited signals.

comment:5 Changed 4 years ago by Anssi Kääriäinen

I did some work to benchmark the changes. Speed difference between HEAD and init hooks approach is 1.6x for a project which has both pre and post init signals from fields (replaced with hooks in init hooks approach). The receiver caching approach gives pretty much the same result.

To benchmark this I added three new benchmarks to djangobench, available here.

A little bit cleaned up init hooks approach is available here. The cleanup is mainly there to remove any possible interference from the testing hacks. Finally, a rebased version of #16679's patch is available here.

comment:6 Changed 4 years ago by Anssi Kääriäinen

I wonder if we want to remove the use of signals now that signal caching is in master. There isn't any compelling speed reason to do so, but init hooks could be a little cleaner coding style than signals.

Any opinions?

comment:7 Changed 4 years ago by Jeremy Dunck

If starting from scratch, I would definitely prefer hooks over signals, but given backwards-compat concerns, I doubt we'll ever fully kill these signals. As such, I agree, caching probably covers the gain here so we should kill this ticket.

comment:8 Changed 4 years ago by Anssi Kääriäinen

Resolution: wontfix
Status: assignedclosed

OK, I think it is time to close this ticket.

Note: See TracTickets for help on using tickets.
Back to Top