Opened 13 years ago

Closed 13 years ago

Last modified 12 years ago

#14700 closed (fixed)

Speed up RawQuerySet iterator

Reported by: Anssi Kääriäinen Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Keywords: rawqueryset, iterator, performance
Cc: Marti Raudsepp Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Currently RawQuerySet uses a lot of cycles doing repeatedly the same calculations inside the iterator loop. The attached patch corrects this problem with the following results, using the test found in #14697 (with Test2.objects.all() replaced by Test2.objects.raw('select * from test__test2'):

Before patch: 0.9 seconds to 1.0 seconds.

After patch 0.195 seconds to 0.205 seconds. (This is just slightly faster than using Test2.objects.all(), with #14697 applied)

There is another patch, which is unfortunately most likely backwards incompatible. The idea of the another patch is to speed up model instance initialization by passing in a dict(attname->val) containing all the values needed, and then using self.__dict__.update(attname, val). This however adds a new keyword argument to the __init__. Before the patch, all the kwargs were attr -> value, but after that there can be a kwarg '_use_dict', which contains a dict of attname -> val. The patch cuts off another 0.06 seconds (or 30%) from the test case, leaving around 0.14 seconds left. There might be some other incompatibilities, too... The same amount of performance increase could be achieved for standard QuerySet iterator using the same hack.

Just fetching the data from db, creating 10000 raw objects and updating 10 attributes for each of those objects results in 0.01 seconds used. Hence there is just 0.04, or 40% overhead left when using the second patch, and 100% overhead when using the first patch.

I will try to find the time to write django-bench benchmark for this case.

Attachments (3)

patch_obj_creation.diff (5.7 KB ) - added by Anssi Kääriäinen 13 years ago.
Probably backwards incompatible optimization to model instance initialization
iterator_benchmarks.tar.gz (1.6 KB ) - added by Anssi Kääriäinen 13 years ago.
patch.diff (5.2 KB ) - added by Anssi Kääriäinen 13 years ago.

Download all attachments as: .zip

Change History (14)

by Anssi Kääriäinen, 13 years ago

Attachment: patch_obj_creation.diff added

Probably backwards incompatible optimization to model instance initialization

comment:1 by Anssi Kääriäinen, 13 years ago

Ok, some django-bench benchmark results:

query_raw: fetch 1000 objects with 11 fields.
Running 'query_raw' benchmark ...
Min: 0.070000 -> 0.010000: 7.0000x faster
Avg: 0.077600 -> 0.019200: 4.0417x faster
Significant (t=80.796194)
Stddev: 0.00431 -> 0.00274: 1.5742x smaller (N = 50)

query_raw_deferred: fetch 1000 objects having 11 fields, but get only the pk from db
Running 'query_raw_deferred' benchmark ...
Min: 0.300000 -> 0.020000: 15.0000x faster
Avg: 0.305200 -> 0.020200: 15.1089x faster
Significant (t=358.775813)
Stddev: 0.00544 -> 0.00141: 3.8439x smaller (N = 50)

the attached tar.gz is the same as for #14697.

by Anssi Kääriäinen, 13 years ago

Attachment: iterator_benchmarks.tar.gz added

comment:2 by Luke Plant, 13 years ago

Patch needs improvement: set
Triage Stage: UnreviewedAccepted

The most recent patch is very broken. It doesn't apply to trunk due to [14613], but even after fixing that up, more importantly it produces a NameError:

NameError: global name 'need_resolv_columns' is not defined

Just looking the patch makes it clear that this is going to happen.

Accepted on the basis that you obviously have a patch that works and there is definitely duplicate work being done that can be eliminated.

by Anssi Kääriäinen, 13 years ago

Attachment: patch.diff added

comment:3 by Anssi Kääriäinen, 13 years ago

Ok, now the patch should apply to trunk and actually work.

The code isn't as clean as I would like it to be, but I don't know how to clean it up more. I do like what the code is actually doing, so the logic should not be a problem...

comment:4 by Luke Plant, 13 years ago

Thanks, this is really great work. I'll commit shortly.

comment:5 by Luke Plant, 13 years ago

Resolution: fixed
Status: newclosed

(In [14692]) Fixed #14700 - speed up RawQuerySet iterator.

This moves constant work out of the loop, and uses the much faster *args
based model instantiation where possible, to produce very large speed ups.

Thanks to akaariai for the report and patch.

comment:6 by Luke Plant, 13 years ago

(In [14693]) [1.2.X] Fixed #14700 - speed up RawQuerySet iterator.

This moves constant work out of the loop, and uses the much faster *args
based model instantiation where possible, to produce very large speed ups.

Thanks to akaariai for the report and patch.

Backport of [14692] from trunk.

comment:7 by Marti Raudsepp, 13 years ago

Resolution: fixed
Status: closedreopened

This patch causes a regression for us. All raw queries seem to be executed twice!

This is using Django SVN trunk revision 14778, with an empty project:

[marti@wrx]% ./manage.py shell
Python 2.7.1 (r271:86832, Dec  2 2010, 03:01:28) 
Type "copyright", "credits" or "license" for more information.

IPython 0.10.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: from django.db import connection

In [2]: from django.contrib.auth.models import User

In [3]: connection.queries
Out[3]: []

In [4]: list(User.objects.raw('select * from auth_user'))
Out[4]: [<User: some_user>]

In [5]: connection.queries
Out[5]: 
[{'sql': u'select * from auth_user', 'time': '0.000'},
 {'sql': u'select * from auth_user', 'time': '0.000'}]

comment:8 by Marti Raudsepp, 13 years ago

Cc: Marti Raudsepp added

comment:9 by Alex Gaynor, 13 years ago

Resolution: fixed
Status: reopenedclosed

(In [14785]) Fixed #14700 -- ensure that a raw query is only executed once per iteration.

comment:10 by Alex Gaynor, 13 years ago

(In [14786]) [1.2.X] Fixed #14700 -- ensure that a raw query is only executed once per iteration. Backport of [14785].

comment:11 by Jacob, 12 years ago

milestone: 1.3

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top