Opened 3 years ago
Closed 3 years ago
#33003 closed Cleanup/optimization (fixed)
Micro-optimisation for QuerySet._chain
Reported by: | Keryn Knight | Owned by: | Keryn Knight |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Whilst working on #28455 I noticed that _chain
accepts kwargs
, but that appears to only be to facilitate pickling, with Prefetch.__getstate__
making use of the functionality. Thus most of the time, kwargs is empty and there's no reason to touch/update the __dict__
. And it's faster not to do so. Note that in comparison with the benefits of #28455 this is absolutely peanuts, but still, it's unrelated to that ticket so here we are:
In [1]: class A: ...: def __init__(self, **kwargs): ...: self.__dict__.update(**kwargs) In [2]: a = A(a=1, b=2, c=3, d=4) In [3]: empty = {} In [4]: full = {'d': 3, 'c': 1} In [5]: %timeit a.__dict__.update(empty) 91.2 ns ± 1.25 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [6]: %timeit a.__dict__.update(full) 133 ns ± 0.92 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
If we just check the dictionary isn't empty (in _chain
that's kwargs
):
In [7]: %timeit if empty: a.__dict__.update(empty) 19 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) In [8]: %timeit if full: a.__dict__.update(full) 156 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
You can see the paying the basic cost of an if (for me, that appears to be 20ns
for an empty dict, and more like 6ns
for an interned/singleton value like 1
or True
or ''
as a baseline) is worthwhile, and having pickling a Prefetch be 20ns
slower isn't too bad, given it's not exactly the most common use case...
The difference is small but can be made more obvious by checking the cProfile timings over many (many) iterations; here's the before across 100 users each with a group and permission:
In [2]: %prun -stime -l_chain for _ in range(1000): tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions')) 48095003 function calls (46202003 primitive calls) in 34.811 seconds Ordered by: internal time List reduced from 384 to 1 due to restriction <'_chain'> ncalls tottime percall cumtime percall filename:lineno(function) 312000 4.070 0.000 8.791 0.000 query.py:1325(_chain) In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions')) 18.3 ms ± 391 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
And after applying the change as follows:
def _chain(...): ... if kwargs: obj.__dict__.update(kwargs) ...
we end up with:
In [2]: %prun -stime -l_chain for _ in range(1000): tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions')) 47779003 function calls (45886003 primitive calls) in 33.220 seconds Ordered by: internal time List reduced from 384 to 1 due to restriction <'_chain'> ncalls tottime percall cumtime percall filename:lineno(function) 312000 0.267 0.000 5.644 0.000 query.py:1325(_chain) In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions')) 18.1 ms ± 289 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
I have to assume that the overall timing differences measured by cProfile and timeit are both just standard fluctuations, despite cProfile's tottime
reporting there.
Branch/PR to follow. It won't be a shock if the diff is 3 lines long, given the change is shown above.
Change History (4)
comment:1 by , 3 years ago
Has patch: | set |
---|
comment:2 by , 3 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 3 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
PR is https://github.com/django/django/pull/14755. CI is still running atm. Probably makes sense for someone to check the timings are independently repeatable given the timescales in question (
ns
). Python version used was3.9.5
.