Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#14700 closed (fixed)

Speed up RawQuerySet iterator

Reported by: akaariai Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords: rawqueryset, iterator, performance
Cc: intgr Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

Currently RawQuerySet uses a lot of cycles doing repeatedly the same calculations inside the iterator loop. The attached patch corrects this problem with the following results, using the test found in #14697 (with Test2.objects.all() replaced by Test2.objects.raw('select * from test__test2'):

Before patch: 0.9 seconds to 1.0 seconds.

After patch 0.195 seconds to 0.205 seconds. (This is just slightly faster than using Test2.objects.all(), with #14697 applied)

There is another patch, which is unfortunately most likely backwards incompatible. The idea of the another patch is to speed up model instance initialization by passing in a dict(attname->val) containing all the values needed, and then using self.__dict__.update(attname, val). This however adds a new keyword argument to the __init__. Before the patch, all the kwargs were attr -> value, but after that there can be a kwarg '_use_dict', which contains a dict of attname -> val. The patch cuts off another 0.06 seconds (or 30%) from the test case, leaving around 0.14 seconds left. There might be some other incompatibilities, too... The same amount of performance increase could be achieved for standard QuerySet iterator using the same hack.

Just fetching the data from db, creating 10000 raw objects and updating 10 attributes for each of those objects results in 0.01 seconds used. Hence there is just 0.04, or 40% overhead left when using the second patch, and 100% overhead when using the first patch.

I will try to find the time to write django-bench benchmark for this case.

Attachments (3)

patch_obj_creation.diff (5.7 KB) - added by akaariai 5 years ago.
Probably backwards incompatible optimization to model instance initialization
iterator_benchmarks.tar.gz (1.6 KB) - added by akaariai 5 years ago.
patch.diff (5.2 KB) - added by akaariai 4 years ago.

Download all attachments as: .zip

Change History (14)

Changed 5 years ago by akaariai

Probably backwards incompatible optimization to model instance initialization

comment:1 Changed 5 years ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Ok, some django-bench benchmark results:

query_raw: fetch 1000 objects with 11 fields.
Running 'query_raw' benchmark ...
Min: 0.070000 -> 0.010000: 7.0000x faster
Avg: 0.077600 -> 0.019200: 4.0417x faster
Significant (t=80.796194)
Stddev: 0.00431 -> 0.00274: 1.5742x smaller (N = 50)

query_raw_deferred: fetch 1000 objects having 11 fields, but get only the pk from db
Running 'query_raw_deferred' benchmark ...
Min: 0.300000 -> 0.020000: 15.0000x faster
Avg: 0.305200 -> 0.020200: 15.1089x faster
Significant (t=358.775813)
Stddev: 0.00544 -> 0.00141: 3.8439x smaller (N = 50)

the attached tar.gz is the same as for #14697.

Changed 5 years ago by akaariai

comment:2 Changed 4 years ago by lukeplant

  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

The most recent patch is very broken. It doesn't apply to trunk due to [14613], but even after fixing that up, more importantly it produces a NameError:

NameError: global name 'need_resolv_columns' is not defined

Just looking the patch makes it clear that this is going to happen.

Accepted on the basis that you obviously have a patch that works and there is definitely duplicate work being done that can be eliminated.

Changed 4 years ago by akaariai

comment:3 Changed 4 years ago by akaariai

Ok, now the patch should apply to trunk and actually work.

The code isn't as clean as I would like it to be, but I don't know how to clean it up more. I do like what the code is actually doing, so the logic should not be a problem...

comment:4 Changed 4 years ago by lukeplant

Thanks, this is really great work. I'll commit shortly.

comment:5 Changed 4 years ago by lukeplant

  • Resolution set to fixed
  • Status changed from new to closed

(In [14692]) Fixed #14700 - speed up RawQuerySet iterator.

This moves constant work out of the loop, and uses the much faster *args
based model instantiation where possible, to produce very large speed ups.

Thanks to akaariai for the report and patch.

comment:6 Changed 4 years ago by lukeplant

(In [14693]) [1.2.X] Fixed #14700 - speed up RawQuerySet iterator.

This moves constant work out of the loop, and uses the much faster *args
based model instantiation where possible, to produce very large speed ups.

Thanks to akaariai for the report and patch.

Backport of [14692] from trunk.

comment:7 Changed 4 years ago by intgr

  • Resolution fixed deleted
  • Status changed from closed to reopened

This patch causes a regression for us. All raw queries seem to be executed twice!

This is using Django SVN trunk revision 14778, with an empty project:

[marti@wrx]% ./manage.py shell
Python 2.7.1 (r271:86832, Dec  2 2010, 03:01:28) 
Type "copyright", "credits" or "license" for more information.

IPython 0.10.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object'. ?object also works, ?? prints more.

In [1]: from django.db import connection

In [2]: from django.contrib.auth.models import User

In [3]: connection.queries
Out[3]: []

In [4]: list(User.objects.raw('select * from auth_user'))
Out[4]: [<User: some_user>]

In [5]: connection.queries
Out[5]: 
[{'sql': u'select * from auth_user', 'time': '0.000'},
 {'sql': u'select * from auth_user', 'time': '0.000'}]

comment:8 Changed 4 years ago by intgr

  • Cc intgr added

comment:9 Changed 4 years ago by Alex

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [14785]) Fixed #14700 -- ensure that a raw query is only executed once per iteration.

comment:10 Changed 4 years ago by Alex

(In [14786]) [1.2.X] Fixed #14700 -- ensure that a raw query is only executed once per iteration. Backport of [14785].

comment:11 Changed 4 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

Note: See TracTickets for help on using tickets.
Back to Top