Speed up RawQuerySet iterator
|Reported by:||Anssi Kääriäinen||Owned by:||nobody|
|Component:||Database layer (models, ORM)||Version:||master|
|Severity:||Keywords:||rawqueryset, iterator, performance|
|Cc:||Marti Raudsepp||Triage Stage:||Accepted|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||yes|
RawQuerySet uses a lot of cycles doing repeatedly the same calculations inside the iterator loop. The attached patch corrects this problem with the following results, using the test found in #14697 (with Test2.objects.all() replaced by Test2.objects.raw('select * from test
Before patch: 0.9 seconds to 1.0 seconds.
After patch 0.195 seconds to 0.205 seconds. (This is just slightly faster than using Test2.objects.all(), with #14697 applied)
There is another patch, which is unfortunately most likely backwards incompatible. The idea of the another patch is to speed up model instance initialization by passing in a dict(attname->val) containing all the values needed, and then using self.
__.update(attname, val). This however adds a new keyword argument to the
__. Before the patch, all the kwargs were attr -> value, but after that there can be a kwarg '
_use_dict', which contains a dict of attname -> val. The patch cuts off another 0.06 seconds (or 30%) from the test case, leaving around 0.14 seconds left. There might be some other incompatibilities, too... The same amount of performance increase could be achieved for standard
QuerySet iterator using the same hack.
Just fetching the data from db, creating 10000 raw objects and updating 10 attributes for each of those objects results in 0.01 seconds used. Hence there is just 0.04, or 40% overhead left when using the second patch, and 100% overhead when using the first patch.
I will try to find the time to write django-bench benchmark for this case.