#14700 closed (fixed)
Speed up RawQuerySet iterator
Reported by: | Anssi Kääriäinen | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Keywords: | rawqueryset, iterator, performance | |
Cc: | Marti Raudsepp | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Currently RawQuerySet
uses a lot of cycles doing repeatedly the same calculations inside the iterator loop. The attached patch corrects this problem with the following results, using the test found in #14697 (with Test2.objects.all() replaced by Test2.objects.raw('select * from test__
test2'):
Before patch: 0.9 seconds to 1.0 seconds.
After patch 0.195 seconds to 0.205 seconds. (This is just slightly faster than using Test2.objects.all(), with #14697 applied)
There is another patch, which is unfortunately most likely backwards incompatible. The idea of the another patch is to speed up model instance initialization by passing in a dict(attname->val) containing all the values needed, and then using self.__
dict__
.update(attname, val). This however adds a new keyword argument to the __
init__
. Before the patch, all the kwargs were attr -> value, but after that there can be a kwarg '_
use_dict', which contains a dict of attname -> val. The patch cuts off another 0.06 seconds (or 30%) from the test case, leaving around 0.14 seconds left. There might be some other incompatibilities, too... The same amount of performance increase could be achieved for standard QuerySet
iterator using the same hack.
Just fetching the data from db, creating 10000 raw objects and updating 10 attributes for each of those objects results in 0.01 seconds used. Hence there is just 0.04, or 40% overhead left when using the second patch, and 100% overhead when using the first patch.
I will try to find the time to write django-bench benchmark for this case.
Attachments (3)
Change History (14)
by , 14 years ago
Attachment: | patch_obj_creation.diff added |
---|
comment:1 by , 14 years ago
Ok, some django-bench benchmark results:
query_raw: fetch 1000 objects with 11 fields. Running 'query_raw' benchmark ... Min: 0.070000 -> 0.010000: 7.0000x faster Avg: 0.077600 -> 0.019200: 4.0417x faster Significant (t=80.796194) Stddev: 0.00431 -> 0.00274: 1.5742x smaller (N = 50) query_raw_deferred: fetch 1000 objects having 11 fields, but get only the pk from db Running 'query_raw_deferred' benchmark ... Min: 0.300000 -> 0.020000: 15.0000x faster Avg: 0.305200 -> 0.020200: 15.1089x faster Significant (t=358.775813) Stddev: 0.00544 -> 0.00141: 3.8439x smaller (N = 50)
the attached tar.gz is the same as for #14697.
by , 14 years ago
Attachment: | iterator_benchmarks.tar.gz added |
---|
comment:2 by , 14 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
The most recent patch is very broken. It doesn't apply to trunk due to [14613], but even after fixing that up, more importantly it produces a NameError
:
NameError: global name 'need_resolv_columns' is not defined
Just looking the patch makes it clear that this is going to happen.
Accepted on the basis that you obviously have a patch that works and there is definitely duplicate work being done that can be eliminated.
by , 14 years ago
Attachment: | patch.diff added |
---|
comment:3 by , 14 years ago
Ok, now the patch should apply to trunk and actually work.
The code isn't as clean as I would like it to be, but I don't know how to clean it up more. I do like what the code is actually doing, so the logic should not be a problem...
comment:5 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:6 by , 14 years ago
comment:7 by , 14 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
This patch causes a regression for us. All raw queries seem to be executed twice!
This is using Django SVN trunk revision 14778, with an empty project:
[marti@wrx]% ./manage.py shell Python 2.7.1 (r271:86832, Dec 2 2010, 03:01:28) Type "copyright", "credits" or "license" for more information. IPython 0.10.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: from django.db import connection In [2]: from django.contrib.auth.models import User In [3]: connection.queries Out[3]: [] In [4]: list(User.objects.raw('select * from auth_user')) Out[4]: [<User: some_user>] In [5]: connection.queries Out[5]: [{'sql': u'select * from auth_user', 'time': '0.000'}, {'sql': u'select * from auth_user', 'time': '0.000'}]
comment:8 by , 14 years ago
Cc: | added |
---|
comment:9 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Probably backwards incompatible optimization to model instance initialization