Opened 7 years ago
Last modified 3 years ago
#28455 assigned Cleanup/optimization
Create "inplace" QuerySets to speed up certain operations
Reported by: | Anssi Kääriäinen | Owned by: | Keryn Knight |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Part two of #20880: Inplace querysets work precisely like full-clone querysets. A queryset is called inplace if it doesn't do cloning at all between operations. This again allows speeding up certain internal operations like model.save() and prefetch_related(), and also allows users to optimise heavy-to-create querysets by avoiding clone() overhead.
Change History (5)
comment:1 by , 7 years ago
comment:2 by , 5 years ago
Cc: | added |
---|
comment:3 by , 3 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I don't think I'm stepping on anyone's toes at this juncture, but do shout if so!
I'd accidentally been wading into similar waters when I stumbled across this ticket, which has ended up at roughly the same place/conclusions, and I have a rough sketch based on the prior branches for an MVP to resume discussion on this. I'll tidy up what I have, try and gather some simple benchmarks, and throw it into the PR queue for sanity checking.
I guarantee there's more nuance to this than I've anticipated so far, but we've got to restart somewhere, right? If it works out, all the better :)
comment:4 by , 3 years ago
PR is https://github.com/django/django/pull/14675
Based heavily on Anssi's original implementation idea. This introduces 2 ways of opting out of cloning, both ostensibly private API. Firstly and primarily, a context manager which temporarily makes the current QuerySet (+ Query, implicitly) mutable:
with User.objects.all()._avoid_cloning() as queryset1: queryset1.filter(x=1).exclude(y=2).filter(z__in=[1,2,3]) # OR queryset2 = User.objects.all() with queryset2._avoid_cloning(): queryset2.filter(x=1).exclude(y=2).filter(z__in=[1,2,3])
And secondly, an imperative form, for use cases where there might be multiple filters across a long list of lines, where it's not desirable to muddy the VCS history and/or the indent depth by using the context manager:
queryset = User.objects.all()._disable_cloning() queryset.filter(x=1).exclude(y=2).filter(z__in=[1,2,3]) queryset._enable_cloning() queryset2 = queryset.filter(abc='test') # This should be a new clone, as usual.
There's currently no support for any sort of nesting or ensured balancing (as with the original implementation, as I understand it), so ._disable_cloning()._disable_cloning()._disable_cloning()
would be undone by a single ._enable_cloning()
call because it's a toggle rather than a stack of pushes/pops; hence the preference for the contextmanager.
Given the following SQLite data:
In [1]: from django.contrib.auth.models import User, Group, Permission In [2]: Group.objects.count() Out[2]: 101 In [3]: User.objects.count() Out[3]: 101 In [4]: Permission.objects.count() Out[4]: 24 In [5]: User.user_permissions.through.count() Out[5]: 0 In [6]: for user in User.objects.all(): ...: print(f'{user.groups.count()}, {user.groups.first().pk}, {user.user_permissions.count()}') 1, 1, 0 1, 2, 0 1, 3, 0 1, 4, 0 1, 5, 0 ... they all have 1 group and zero permissions In[7]: from django.db import connection In[8]: connection.queries[-2:] [{'sql': 'SELECT "auth_user"."id", ... FROM "auth_user"', 'time': '0.001'}, {'sql': 'SELECT ("auth_user_groups"."user_id") AS "_prefetch_related_val_user_id", "auth_group"."id", "auth_group"."name" FROM "auth_group" INNER JOIN "auth_user_groups" ON ("auth_group"."id" = "auth_user_groups"."group_id") WHERE "auth_user_groups"."user_id" IN (1, 2, 3, [...] 99, 100, 101)', 'time': '0.000'}]
Before applying any of the changes:
In [2]: %timeit tuple(User.objects.all()) 1.26 ms ± 9.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) (0 calls to ._clone) In [3]: %timeit tuple(User.objects.prefetch_related('groups')) 6.52 ms ± 62.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) (105 calls to ._clone) In [4]: %timeit tuple(User.objects.prefetch_related('groups', 'user_permissions')) 12.1 ms ± 226 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) (213 calls to ._clone)
After enabling the avoidance cloning technique for prefetch_related only:
In [2]: %timeit tuple(User.objects.all()) 1.28 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) (0 calls to ._clone) In [3]: %timeit tuple(User.objects.prefetch_related('groups')) 5.93 ms ± 53.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) (4 calls to ._clone) In [4]: %timeit tuple(User.objects.prefetch_related('groups', 'user_permissions')) 10.2 ms ± 59.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) (7 calls to ._clone)
So for the 1 prefetch it's a decrease of maybe ~9%, and for the 2 prefetches it's a decrease of roughly ~15%. I make no pretense about being able to calculate improvement by factor vs percent and risk getting it wrong.
The number of calls to clone was done by re-running the queries in fresh ipython
sessions, having _clone()
increment a global integer each time it's called. It isn't part of the actual timeit numbers.
These improved prefetch timings are only found when hitting the manager.get_queryset()
of prefetch_one_level
because we can assume that returns a fresh QuerySet instance (or subclass) that holds OK data and doesn't need cloning.
The same cannot be said of if leaf and lookup.queryset [...]
I don't think, because that queryset is shared and in an unknown state, so at least for the initial MVP remains cloned. It may do one fewer clone operation if the database would be changed by using()
.
Calling chain directly, without disabling cloning:
In [1]: from django.contrib.auth.models import User, Group, Permission In [2]: x = User.objects.all() In [4]: %timeit y = x._chain() 7.05 µs ± 62.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
With cloning disabled:
In [6]: %timeit y = x._chain() 275 ns ± 3.15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
As I write this, I have not implemented the other updates Anssi did like update get_contenttypes_and_models
or create_permissions
et al so that the proposed API changes can be decided on and looked over before moving forward with other potential usages.
Additionally as I write this, I've added no new tests to properly demonstrate the correctness of it, for the same reasons. For the discussion to kick off, it's going to be enough to see that the test suite passes or fails and iterate from there.
The number of QuerySet._clone
operations done by the test suite (Ran 14876 tests, skipped=1192, expected failures=4) drops from 144851
to 142305
which isn't a huge amount of confidence, but it's a start.
comment:5 by , 3 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
Updated slightly, and now I've sat down to get cProfile information using %prun for _ in range(100): tuple(User.objects.prefetch_related('groups__permissions', 'user_permissions'))
First, the baseline, showing only operations related to the change, for brevity (so no Model.__init__
etc):
ncalls tottime percall cumtime percall filename:lineno(function) 31200 0.165 0.000 0.303 0.000 query.py:290(clone) 300 0.098 0.000 3.134 0.010 query.py:1860(prefetch_one_level) 31200 0.066 0.000 0.444 0.000 query.py:1337(_clone) 30200 0.055 0.000 0.655 0.000 related_descriptors.py:883(_apply_rel_filters) 40600 0.052 0.000 1.231 0.000 query.py:45(__iter__) 30200 0.051 0.000 0.897 0.000 related_descriptors.py:899(get_queryset) 31200 0.046 0.000 0.500 0.000 query.py:1325(_chain) 40200 0.041 0.000 0.327 0.000 base.py:511(from_db) 30500 0.039 0.000 0.931 0.000 query.py:982(_filter_or_exclude) 30600 0.030 0.000 0.194 0.000 manager.py:142(get_queryset) 31200 0.029 0.000 0.337 0.000 query.py:341(chain) 31200 0.028 0.000 0.070 0.000 where.py:142(clone)
Using the @contextmanager
decorator. Lines are shown in the same order as the above, so they're technically ordered by the baseline's internal time:
ncalls tottime percall cumtime percall filename:lineno(function) 1000 0.007 0.000 0.013 0.000 query.py:290(clone) 300 0.082 0.000 2.932 0.010 query.py:1881(prefetch_one_level) 1000 0.003 0.000 0.019 0.000 query.py:1358(_clone) 30200 0.097 0.000 0.444 0.000 related_descriptors.py:884(_apply_rel_filters) 40600 0.052 0.000 0.947 0.000 query.py:46(__iter__) 30200 0.054 0.000 1.039 0.000 related_descriptors.py:901(get_queryset) 31200 0.029 0.000 0.056 0.000 query.py:1343(_chain) 40200 0.041 0.000 0.376 0.000 base.py:511(from_db) 30500 0.041 0.000 0.496 0.000 query.py:984(_filter_or_exclude) 30600 0.032 0.000 0.545 0.000 manager.py:142(get_queryset) 1000 0.001 0.000 0.014 0.000 query.py:341(chain) 1000 0.002 0.000 0.003 0.000 where.py:142(clone) ... 30600 0.081 0.000 0.086 0.000 contextlib.py:86(__init__) 60400 0.024 0.000 0.034 0.000 query.py:1335(_avoid_cloning) 30600 0.017 0.000 0.067 0.000 contextlib.py:121(__exit__) 30600 0.016 0.000 0.102 0.000 contextlib.py:242(helper) 30600 0.016 0.000 0.042 0.000 contextlib.py:112(__enter__)
And then finally with a custom context manager class, again using the same ordering as the baseline:
ncalls tottime percall cumtime percall filename:lineno(function) 1000 0.010 0.000 0.016 0.000 query.py:290(clone) 300 0.095 0.000 3.293 0.011 query.py:1888(prefetch_one_level) 1000 0.004 0.000 0.023 0.000 query.py:1365(_clone) 30200 0.102 0.000 0.412 0.000 related_descriptors.py:884(_apply_rel_filters) 40600 0.062 0.000 1.133 0.000 query.py:46(__iter__) 30200 0.063 0.000 1.032 0.000 related_descriptors.py:901(get_queryset) 31200 0.077 0.000 0.111 0.000 query.py:1350(_chain) 40200 0.049 0.000 0.391 0.000 base.py:511(from_db) 30500 0.070 0.000 0.650 0.000 query.py:996(_filter_or_exclude) 30600 0.038 0.000 0.563 0.000 manager.py:142(get_queryset) 1000 0.002 0.000 0.018 0.000 query.py:341(chain) 1000 0.002 0.000 0.004 0.000 where.py:142(clone) ... 30200 0.006 0.000 0.006 0.000 query.py:178(__init__) 30200 0.016 0.000 0.022 0.000 query.py:1347(_avoid_cloning) 30200 0.014 0.000 0.020 0.000 query.py:184(__exit__) 30200 0.015 0.000 0.021 0.000 query.py:181(__enter__)
Using the custom context manager is substantially faster than the contextlib
decorator, at least in the simple form I currently have it. Here's the decorator version:
In [1]: from django.contrib.auth.models import User In [2]: x = User.objects.all() In [3]: %timeit x._avoid_cloning() 1.01 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit with x._avoid_cloning(): pass 1.98 µs ± 63.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and then using the custom class:
In [3]: %timeit x._avoid_cloning() 276 ns ± 2.14 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [4]: %timeit with x._avoid_cloning(): pass 819 ns ± 39.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
py-spy (sampling profiler) also suggests sql.Query.clone
is 3% of the time spent, where it doesn't even look to get sampled enough to be relevant, once cloning avoidance is introduced.
Meanwhile linters are the bane of my life and so the patch remains 'needs improvement'
I rebased work from Anssi's branch. I guess the plan was to keep "inplace" as a private API initially (hence the underscore prefix) but I'm not sure. Someone else is welcome to continue this work. I'm not planning to continue it soon.