Code

Opened 19 months ago

Closed 12 days ago

#19501 closed Cleanup/optimization (fixed)

Allow for fast path model loading

Reported by: akaariai Owned by: akaariai
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have been playing around with the idea of pushing the model construction to a class method of the model. This will allow very fast reads from database, around 2x faster for normal model, 4x faster for deferred model when you override the from_db() so that it constructs the model directly using the model.dict and doesn't send any signals. With signal sending the speed different is 1.2x for normal model, ~3x for deferred model.

Of course, you can't do that by default. This breaks multiple things: setattr not called (and descriptors don't work either), overridden init not called and of course no signals sent. So, the default would need to be old good __init__ way, or we would need to make a really big backwards compatibility break.

Still, in most projects where model init is a performance bottleneck it will be for just one or two models. 90%+ of models do not need the signals, init call nor do they have any setter of interest in the from DB case. For these models it would be trivial to create a "fast path readonly proxy" for the performance bottleneck cases.

So, by adding the from_db method we could allow for really fast init for those who need it. The cost is around 5% slower for the default __init__ case.

Actually, if you are not going to write the data back to the DB you can throw away the model._state, too. Then you get this kind of results:

Running 'model_init_signals' benchmark ...
Min: 0.007368 -> 0.002830: 2.6035x faster
Avg: 0.007430 -> 0.002847: 2.6100x faster
Significant (t=347.070697)
Stddev: 0.00006 -> 0.00001: 5.6553x smaller (N = 18)

Running 'query_all' benchmark ...
Min: 0.006640 -> 0.002699: 2.4600x faster
Avg: 0.006684 -> 0.002713: 2.4636x faster
Significant (t=687.137094)
Stddev: 0.00002 -> 0.00001: 2.3581x smaller (N = 18)

Running 'query_defer' benchmark ...
Min: 0.016939 -> 0.003011: 5.6257x faster
Avg: 0.017019 -> 0.003028: 5.6211x faster
Significant (t=1300.019953)
Stddev: 0.00004 -> 0.00001: 3.9626x smaller (N = 18)

I think the 5% slowdown for the default __init__ path is worth it so that those who really need speed can achieve the above shown performance.

A really barebones patch at: https://github.com/akaariai/django/compare/fast_init - this can be used to run benchmarks but can't be included in Django as is - it doesn't have the safe __init__ as the default.

Attachments (1)

fast_init.diff (14.6 KB) - added by akaariai 19 months ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 19 months ago by apollo13

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Accepting the general idea here, can't comment much on the code yet. Can we somehow automatically decide if fast-path loading can be used?

comment:2 Changed 19 months ago by akaariai

For the model._state - no. The model._state is used at least when saving and when validating ModelForms and we can't know if this is going to happen beforehand.

For signal sending the answer is yes, though we need to do this per-queryset basis as between querysets the signals could be changed.

For updating the dict directly I think we could do this if there are no __setattr__ defined, and there are no descriptors defined for the field attnames.

And, if there is an overridden __init__, then we must call that in any case.

Likely the best approach is to have just one "use_fastpath" flag to from_db() which we would set to True in case all of the above conditions allow for fast pathing.

I think this would work: in the iterator()'s setup code we would check if use_fastpath is possible (model._meta.can_use_fastpath(init_list), then pass the result to from_db().

Those who want to squeeze the model._state overhead out would need to override the (non public) from_db() method.

comment:3 Changed 19 months ago by akaariai

  • Owner changed from nobody to akaariai
  • Patch needs improvement set

I have code that proves that it is possible to detect fast_path init conditions. Attached as patch as github is down. The code should be safe. Well, there are some crazy use cases which would change but they are crazy enough to say "who cares".

For example:

for obj in qs:
    if somecond:
        signals.pre_init.connect(somesignal, sender=obj.__class__)

this would not work like it did before, signals would not be send inside the loop. Everybody crazy enough to have this kind of code deserves what is coming for them :)

For performance numbers:

Running 'query_all' benchmark ...
Min: 0.006885 -> 0.003727: 1.8474x faster
Avg: 0.006929 -> 0.003766: 1.8398x faster
Significant (t=299.109631)
Stddev: 0.00003 -> 0.00003: 1.1666x larger (N = 18)

Running 'query_all_multifield' benchmark ...
Min: 0.012577 -> 0.007315: 1.7193x faster
Avg: 0.012625 -> 0.007391: 1.7081x faster
Significant (t=295.886261)
Stddev: 0.00005 -> 0.00005: 1.0607x larger (N = 18)

Running 'query_defer' benchmark ...
Min: 0.016842 -> 0.004038: 4.1708x faster
Avg: 0.016909 -> 0.004098: 4.1263x faster
Significant (t=841.877024)
Stddev: 0.00005 -> 0.00005: 1.0069x smaller (N = 18)

Running 'query_get' benchmark ...
Min: 0.024566 -> 0.024049: 1.0215x faster
Avg: 0.024594 -> 0.024079: 1.0214x faster
Significant (t=83.266099)
Stddev: 0.00002 -> 0.00002: 1.1787x larger (N = 18)

Running 'query_raw' benchmark ...
Min: 0.013461 -> 0.007918: 1.7000x faster
Avg: 0.013510 -> 0.007957: 1.6979x faster
Significant (t=439.052454)
Stddev: 0.00004 -> 0.00003: 1.2919x smaller (N = 18)

Running 'query_raw_deferred' benchmark ...
Min: 0.015671 -> 0.003356: 4.6696x faster
Avg: 0.015714 -> 0.003391: 4.6345x faster
Significant (t=1321.202256)
Stddev: 0.00003 -> 0.00002: 1.2602x smaller (N = 18)

Running 'model_init_self_signals' benchmark ...
Min: 0.011877 -> 0.012277: 1.0337x slower
Avg: 0.011926 -> 0.012312: 1.0323x slower
Significant (t=-38.998385)
Stddev: 0.00003 -> 0.00003: 1.1830x smaller (N = 18)

What can be learned from this? Well, init is faster, except a minor slowdown in model_init_self_signals. Query.get() isn't changed much. Deferred model loading is now clearly faster than normal loading. It was slower(!) before. query_defer uses same model def than query_all_multifield.

For ultimate readonly speed one can change the from_db. query_all as example:

models.py:
from django.db import models
from django.db.models.base import Empty

class Book(models.Model):
    title = models.CharField(max_length=100)

    @classmethod
    def from_db(cls, using, values, attnames, init_with_args, can_fast_path):
        new = Empty()
        new.__class__ = cls
        new.id, new.title = values
        return new

the from_db isn't safe for deferred loading (init_with_args tells us if all model's fields values are present, maybe I should rename that...), and it always fast paths. One can't use these models in form validation nor can one use these when writing (no _state). However, the speed difference is major:

Running 'query_all' benchmark ...
Min: 0.006691 -> 0.001791: 3.7359x faster
Avg: 0.006725 -> 0.001800: 3.7369x faster
Significant (t=794.562242)
Stddev: 0.00002 -> 0.00001: 3.0487x smaller (N = 18)

I would like opinions if we want the added complexity. I know I am biased towards optimizing vs code clarity. Still, in my opinion the code is actually pretty clear, so this isn't a lot worse than before.

The patch needs some more tests (we should test directly the can_fast_path() at least). It might be wise to move the can_fast_path() to Model as _can_fast_path() to avoid the circular import problems. Also, from_db() would likely need to be change to _from_db(), as it isn't a good idea to make it public API. I think it falls into "semi-public" api territory where we don't change it if we don't have a good reason, but if we *do* have a reason then we can change the signature. And, we can always make it public later if we wish so.

Changed 19 months ago by akaariai

comment:4 Changed 17 months ago by akaariai

I did some work on the patch. Custom fields using SubFieldBase do not use __set__, instead they use __get__. In addition I verified that nothing in Django relies on custom __init__ or model __init__ signals. Still, nothing in Django relies on __setattr__.

The easily most common action on load is to convert field value from DB format to Python format. But this is exactly what SubFieldBase is meant for. There might be other use cases, maybe something like calculating a value on __init__. This seems fragile - if you then go and change obj.somefield value and that field is used in the calculation your code will break. Thus, a property seems a lot better way for this particular use case.

Django's code base shows that one usually wants to differentiate __init__ in Python and load from DB, for example the init signals in Django act only when initializing object, but they do nothing for load from db case. Overriding __init__ is also hard for this reason, you don't know if the object is loaded from DB or if this is a "plain" init.

The performance numbers are nice:

Running 'query_all' benchmark ...
Min: 0.007524 -> 0.004527: 1.6620x faster
Avg: 0.007551 -> 0.004580: 1.6486x faster
Significant (t=312.956724)
Stddev: 0.00002 -> 0.00003: 1.4246x larger (N = 18)

Running 'query_defer' benchmark ...
Min: 0.017146 -> 0.004315: 3.9736x faster
Avg: 0.017218 -> 0.004349: 3.9587x faster
Significant (t=896.296579)
Stddev: 0.00005 -> 0.00003: 1.6903x smaller (N = 18)

If you add in the patch for removing the chunked iterators (#18702), then you get this:

Running 'query_all' benchmark ...
Min: 0.007714 -> 0.004020: 1.9189x faster
Avg: 0.007756 -> 0.004064: 1.9082x faster
Significant (t=341.747393)
Stddev: 0.00003 -> 0.00003: 1.1788x larger (N = 18)

Running 'query_defer' benchmark ...
Min: 0.017498 -> 0.004134: 4.2325x faster
Avg: 0.017535 -> 0.004173: 4.2022x faster
Significant (t=1455.612646)
Stddev: 0.00002 -> 0.00003: 1.2341x larger (N = 18)

So, even more gain with simplified iterator protocol.

Last, for those who need absolute speed for read-only data, alter the from_db() to this (for query_all):

@classmethod
def from_db(cls, using, values, field_names, init_with_args, can_fast_init):
    new = Empty()
    new.__class__ = cls
    new.id, new.title = values


Running 'query_all' benchmark ...
Min: 0.007590 -> 0.001996: 3.8026x faster
Avg: 0.007651 -> 0.002014: 3.7987x faster
Significant (t=457.077059)
Stddev: 0.00005 -> 0.00001: 4.4830x smaller (N = 18)

Now, such a model will only work in read-only cases (no model._state), and wont work with .defer() or .only() (all fields must always be present), but if you happen to need speed for read-only case, you can get it.

The branches are here: https://github.com/akaariai/django/compare/fast_init_test and https://github.com/akaariai/django/compare/fast_init_and_nonchunked (with the latest patch from #18702 added).

There surely is code that will break if this is committed as-is. So, I wonder if this should be opt-in or opt-out with model._meta flag, or if the whole load using __init__ could be deprecated totally.

EDIT: Some typo fixes, some clarified wording...

Last edited 17 months ago by akaariai (previous) (diff)

comment:5 Changed 14 months ago by anonymous

Sort of off-topic, but how do you run these benchmarks? Is there a tutorial online for testing different branches like this? This would go a long way towards improving performance in my projects. Come to think of it, it could even be included in a future "Optimizing Django" section in the docs.

comment:6 Changed 4 weeks ago by akaariai

See PR https://github.com/django/django/pull/2802 for a basic implementation.

The cost of this is 10% slower query_all benchmark, and around 15% slower query_select_related benchmark (these fetch large amount of objects and do nothing else). OTOH, using the new hook it will be possible to speed up model loading dramatically (query_all is 1.7x faster with direct dict assignment).

My feeling is that if you have a bottleneck due to model instantiation speed, then you want this patch so that you can optimize the way your models are loaded. If you don't have a bottleneck in model instantiation, then you don't care about the 10% in any case.

comment:7 Changed 4 weeks ago by akaariai

  • Has patch set
  • Patch needs improvement unset

comment:8 Changed 3 weeks ago by timo

  • Patch needs improvement set

Patch has some review comments.

comment:9 Changed 2 weeks ago by akaariai

  • Patch needs improvement unset

I updated the code and docs based on review comments. The code was mistakenly creating a new model._state in from_db() method, when just updating the attributes of the existing ._state instance is enough. Now the slowdown (using djangobench) is:

Running 'query_select_related' benchmark ...
Min: 0.054252 -> 0.058567: 1.0795x slower
Avg: 0.055689 -> 0.059967: 1.0768x slower
Significant (t=-8.076647)
Stddev: 0.00271 -> 0.00258: 1.0494x smaller (N = 50)

Running 'query_all' benchmark ...
Min: 0.021309 -> 0.022659: 1.0633x slower
Avg: 0.022763 -> 0.024240: 1.0649x slower
Significant (t=-2.598991)
Stddev: 0.00285 -> 0.00283: 1.0042x smaller (N = 50)

Running 'query_raw_deferred' benchmark ...
Min: 0.016543 -> 0.017822: 1.0773x slower
Avg: 0.017149 -> 0.018389: 1.0723x slower
Significant (t=-4.172737)
Stddev: 0.00150 -> 0.00147: 1.0174x smaller (N = 50)

So, the code itself causes a minor slowdown. The benchmarks done in above comments and the ticket's description test various fast-path implementations. Basically, one can do the following for their model:

class Empty(object):
    pass

class MyModel(models.Model):
    ...

    @classmethod
    def from_db(cls, db, field_names, values):
        empty = Empty()
        empty.__class__ = cls
        empty.__dict__ = zip(field_names, values)
        empty._state = ModelState()
        empty._state.adding = False
        return empty

and this results in around 2x better performance for query_all benchmark, and 4x speedup for deferred model loading (query_defer benchmark).

This ticket discussed various ways to do automatically the above optimization - it is possible to do if the model doesn't have customized init, doesn't have pre/post init signals, and calling setattr isn't required for the model. Detecting this seems too complex, so the automation isn't there any more. However, users who require absolute performance can do the optimization manually if needed.

comment:10 Changed 13 days ago by timo

  • Triage Stage changed from Accepted to Ready for checkin

comment:11 Changed 12 days ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 0b6f05ede648ce62a5c91c7c38a0a362711f0656:

Fixed #19501 -- added Model.from_db() method

The Model.from_db() is intended to be used in cases where customization
of model loading is needed. Reasons can be performance, or adding custom
behavior to the model (for example "dirty field tracking" to issue
automatic update_fields when saving models).

A big thank you to Tim Graham for the review!

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.