Opened 12 years ago
Closed 11 years ago
#19501 closed Cleanup/optimization (fixed)
Allow for fast path model loading
Reported by: | Anssi Kääriäinen | Owned by: | Anssi Kääriäinen |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have been playing around with the idea of pushing the model construction to a class method of the model. This will allow very fast reads from database, around 2x faster for normal model, 4x faster for deferred model when you override the from_db() so that it constructs the model directly using the model.dict and doesn't send any signals. With signal sending the speed different is 1.2x for normal model, ~3x for deferred model.
Of course, you can't do that by default. This breaks multiple things: setattr not called (and descriptors don't work either), overridden init not called and of course no signals sent. So, the default would need to be old good __init__
way, or we would need to make a really big backwards compatibility break.
Still, in most projects where model init is a performance bottleneck it will be for just one or two models. 90%+ of models do not need the signals, init call nor do they have any setter of interest in the from DB case. For these models it would be trivial to create a "fast path readonly proxy" for the performance bottleneck cases.
So, by adding the from_db method we could allow for really fast init for those who need it. The cost is around 5% slower for the default __init__
case.
Actually, if you are not going to write the data back to the DB you can throw away the model._state, too. Then you get this kind of results:
Running 'model_init_signals' benchmark ... Min: 0.007368 -> 0.002830: 2.6035x faster Avg: 0.007430 -> 0.002847: 2.6100x faster Significant (t=347.070697) Stddev: 0.00006 -> 0.00001: 5.6553x smaller (N = 18) Running 'query_all' benchmark ... Min: 0.006640 -> 0.002699: 2.4600x faster Avg: 0.006684 -> 0.002713: 2.4636x faster Significant (t=687.137094) Stddev: 0.00002 -> 0.00001: 2.3581x smaller (N = 18) Running 'query_defer' benchmark ... Min: 0.016939 -> 0.003011: 5.6257x faster Avg: 0.017019 -> 0.003028: 5.6211x faster Significant (t=1300.019953) Stddev: 0.00004 -> 0.00001: 3.9626x smaller (N = 18)
I think the 5% slowdown for the default __init__
path is worth it so that those who really need speed can achieve the above shown performance.
A really barebones patch at: https://github.com/akaariai/django/compare/fast_init - this can be used to run benchmarks but can't be included in Django as is - it doesn't have the safe __init__
as the default.
Attachments (1)
Change History (12)
comment:1 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 12 years ago
For the model._state - no. The model._state is used at least when saving and when validating ModelForms and we can't know if this is going to happen beforehand.
For signal sending the answer is yes, though we need to do this per-queryset basis as between querysets the signals could be changed.
For updating the dict directly I think we could do this if there are no __setattr__
defined, and there are no descriptors defined for the field attnames.
And, if there is an overridden __init__
, then we must call that in any case.
Likely the best approach is to have just one "use_fastpath" flag to from_db() which we would set to True in case all of the above conditions allow for fast pathing.
I think this would work: in the iterator()'s setup code we would check if use_fastpath is possible (model._meta.can_use_fastpath(init_list), then pass the result to from_db().
Those who want to squeeze the model._state overhead out would need to override the (non public) from_db() method.
comment:3 by , 12 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | set |
I have code that proves that it is possible to detect fast_path init conditions. Attached as patch as github is down. The code should be safe. Well, there are some crazy use cases which would change but they are crazy enough to say "who cares".
For example:
for obj in qs: if somecond: signals.pre_init.connect(somesignal, sender=obj.__class__)
this would not work like it did before, signals would not be send inside the loop. Everybody crazy enough to have this kind of code deserves what is coming for them :)
For performance numbers:
Running 'query_all' benchmark ... Min: 0.006885 -> 0.003727: 1.8474x faster Avg: 0.006929 -> 0.003766: 1.8398x faster Significant (t=299.109631) Stddev: 0.00003 -> 0.00003: 1.1666x larger (N = 18) Running 'query_all_multifield' benchmark ... Min: 0.012577 -> 0.007315: 1.7193x faster Avg: 0.012625 -> 0.007391: 1.7081x faster Significant (t=295.886261) Stddev: 0.00005 -> 0.00005: 1.0607x larger (N = 18) Running 'query_defer' benchmark ... Min: 0.016842 -> 0.004038: 4.1708x faster Avg: 0.016909 -> 0.004098: 4.1263x faster Significant (t=841.877024) Stddev: 0.00005 -> 0.00005: 1.0069x smaller (N = 18) Running 'query_get' benchmark ... Min: 0.024566 -> 0.024049: 1.0215x faster Avg: 0.024594 -> 0.024079: 1.0214x faster Significant (t=83.266099) Stddev: 0.00002 -> 0.00002: 1.1787x larger (N = 18) Running 'query_raw' benchmark ... Min: 0.013461 -> 0.007918: 1.7000x faster Avg: 0.013510 -> 0.007957: 1.6979x faster Significant (t=439.052454) Stddev: 0.00004 -> 0.00003: 1.2919x smaller (N = 18) Running 'query_raw_deferred' benchmark ... Min: 0.015671 -> 0.003356: 4.6696x faster Avg: 0.015714 -> 0.003391: 4.6345x faster Significant (t=1321.202256) Stddev: 0.00003 -> 0.00002: 1.2602x smaller (N = 18) Running 'model_init_self_signals' benchmark ... Min: 0.011877 -> 0.012277: 1.0337x slower Avg: 0.011926 -> 0.012312: 1.0323x slower Significant (t=-38.998385) Stddev: 0.00003 -> 0.00003: 1.1830x smaller (N = 18)
What can be learned from this? Well, init is faster, except a minor slowdown in model_init_self_signals. Query.get() isn't changed much. Deferred model loading is now clearly faster than normal loading. It was slower(!) before. query_defer uses same model def than query_all_multifield.
For ultimate readonly speed one can change the from_db. query_all as example:
models.py: from django.db import models from django.db.models.base import Empty class Book(models.Model): title = models.CharField(max_length=100) @classmethod def from_db(cls, using, values, attnames, init_with_args, can_fast_path): new = Empty() new.__class__ = cls new.id, new.title = values return new
the from_db isn't safe for deferred loading (init_with_args tells us if all model's fields values are present, maybe I should rename that...), and it always fast paths. One can't use these models in form validation nor can one use these when writing (no _state). However, the speed difference is major:
Running 'query_all' benchmark ... Min: 0.006691 -> 0.001791: 3.7359x faster Avg: 0.006725 -> 0.001800: 3.7369x faster Significant (t=794.562242) Stddev: 0.00002 -> 0.00001: 3.0487x smaller (N = 18)
I would like opinions if we want the added complexity. I know I am biased towards optimizing vs code clarity. Still, in my opinion the code is actually pretty clear, so this isn't a lot worse than before.
The patch needs some more tests (we should test directly the can_fast_path() at least). It might be wise to move the can_fast_path() to Model as _can_fast_path() to avoid the circular import problems. Also, from_db() would likely need to be change to _from_db(), as it isn't a good idea to make it public API. I think it falls into "semi-public" api territory where we don't change it if we don't have a good reason, but if we *do* have a reason then we can change the signature. And, we can always make it public later if we wish so.
by , 12 years ago
Attachment: | fast_init.diff added |
---|
comment:4 by , 12 years ago
I did some work on the patch. Custom fields using SubFieldBase do not use __set__
, instead they use __get__
. In addition I verified that nothing in Django relies on custom __init__
or model __init__
signals. Still, nothing in Django relies on __setattr__
.
The easily most common action on load is to convert field value from DB format to Python format. But this is exactly what SubFieldBase is meant for. There might be other use cases, maybe something like calculating a value on __init__
. This seems fragile - if you then go and change obj.somefield value and that field is used in the calculation your code will break. Thus, a property seems a lot better way for this particular use case.
Django's code base shows that one usually wants to differentiate __init__
in Python and load from DB, for example the init signals in Django act only when initializing object, but they do nothing for load from db case. Overriding __init__
is also hard for this reason, you don't know if the object is loaded from DB or if this is a "plain" init.
The performance numbers are nice:
Running 'query_all' benchmark ... Min: 0.007524 -> 0.004527: 1.6620x faster Avg: 0.007551 -> 0.004580: 1.6486x faster Significant (t=312.956724) Stddev: 0.00002 -> 0.00003: 1.4246x larger (N = 18) Running 'query_defer' benchmark ... Min: 0.017146 -> 0.004315: 3.9736x faster Avg: 0.017218 -> 0.004349: 3.9587x faster Significant (t=896.296579) Stddev: 0.00005 -> 0.00003: 1.6903x smaller (N = 18)
If you add in the patch for removing the chunked iterators (#18702), then you get this:
Running 'query_all' benchmark ... Min: 0.007714 -> 0.004020: 1.9189x faster Avg: 0.007756 -> 0.004064: 1.9082x faster Significant (t=341.747393) Stddev: 0.00003 -> 0.00003: 1.1788x larger (N = 18) Running 'query_defer' benchmark ... Min: 0.017498 -> 0.004134: 4.2325x faster Avg: 0.017535 -> 0.004173: 4.2022x faster Significant (t=1455.612646) Stddev: 0.00002 -> 0.00003: 1.2341x larger (N = 18)
So, even more gain with simplified iterator protocol.
Last, for those who need absolute speed for read-only data, alter the from_db() to this (for query_all):
@classmethod def from_db(cls, using, values, field_names, init_with_args, can_fast_init): new = Empty() new.__class__ = cls new.id, new.title = values Running 'query_all' benchmark ... Min: 0.007590 -> 0.001996: 3.8026x faster Avg: 0.007651 -> 0.002014: 3.7987x faster Significant (t=457.077059) Stddev: 0.00005 -> 0.00001: 4.4830x smaller (N = 18)
Now, such a model will only work in read-only cases (no model._state), and wont work with .defer() or .only() (all fields must always be present), but if you happen to need speed for read-only case, you can get it.
The branches are here: https://github.com/akaariai/django/compare/fast_init_test and https://github.com/akaariai/django/compare/fast_init_and_nonchunked (with the latest patch from #18702 added).
There surely is code that will break if this is committed as-is. So, I wonder if this should be opt-in or opt-out with model._meta flag, or if the whole load using __init__
could be deprecated totally.
EDIT: Some typo fixes, some clarified wording...
comment:5 by , 12 years ago
Sort of off-topic, but how do you run these benchmarks? Is there a tutorial online for testing different branches like this? This would go a long way towards improving performance in my projects. Come to think of it, it could even be included in a future "Optimizing Django" section in the docs.
comment:6 by , 11 years ago
See PR https://github.com/django/django/pull/2802 for a basic implementation.
The cost of this is 10% slower query_all benchmark, and around 15% slower query_select_related benchmark (these fetch large amount of objects and do nothing else). OTOH, using the new hook it will be possible to speed up model loading dramatically (query_all is 1.7x faster with direct dict assignment).
My feeling is that if you have a bottleneck due to model instantiation speed, then you want this patch so that you can optimize the way your models are loaded. If you don't have a bottleneck in model instantiation, then you don't care about the 10% in any case.
comment:7 by , 11 years ago
Has patch: | set |
---|---|
Patch needs improvement: | unset |
comment:9 by , 11 years ago
Patch needs improvement: | unset |
---|
I updated the code and docs based on review comments. The code was mistakenly creating a new model._state in from_db() method, when just updating the attributes of the existing ._state instance is enough. Now the slowdown (using djangobench) is:
Running 'query_select_related' benchmark ... Min: 0.054252 -> 0.058567: 1.0795x slower Avg: 0.055689 -> 0.059967: 1.0768x slower Significant (t=-8.076647) Stddev: 0.00271 -> 0.00258: 1.0494x smaller (N = 50) Running 'query_all' benchmark ... Min: 0.021309 -> 0.022659: 1.0633x slower Avg: 0.022763 -> 0.024240: 1.0649x slower Significant (t=-2.598991) Stddev: 0.00285 -> 0.00283: 1.0042x smaller (N = 50) Running 'query_raw_deferred' benchmark ... Min: 0.016543 -> 0.017822: 1.0773x slower Avg: 0.017149 -> 0.018389: 1.0723x slower Significant (t=-4.172737) Stddev: 0.00150 -> 0.00147: 1.0174x smaller (N = 50)
So, the code itself causes a minor slowdown. The benchmarks done in above comments and the ticket's description test various fast-path implementations. Basically, one can do the following for their model:
class Empty(object): pass class MyModel(models.Model): ... @classmethod def from_db(cls, db, field_names, values): empty = Empty() empty.__class__ = cls empty.__dict__ = zip(field_names, values) empty._state = ModelState() empty._state.adding = False return empty
and this results in around 2x better performance for query_all benchmark, and 4x speedup for deferred model loading (query_defer benchmark).
This ticket discussed various ways to do automatically the above optimization - it is possible to do if the model doesn't have customized init
, doesn't have pre/post init signals, and calling setattr isn't required for the model. Detecting this seems too complex, so the automation isn't there any more. However, users who require absolute performance can do the optimization manually if needed.
comment:10 by , 11 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:11 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Accepting the general idea here, can't comment much on the code yet. Can we somehow automatically decide if fast-path loading can be used?