id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
33003	Micro-optimisation for QuerySet._chain	Keryn Knight	Keryn Knight	"Whilst working on #28455 I noticed that `_chain` accepts `kwargs`, but that appears to only be to facilitate pickling, with `Prefetch.__getstate__` making use of the functionality. Thus most of the time, kwargs is empty and there's no reason to touch/update the `__dict__`. And it's faster not to do so. Note that in comparison with the benefits of #28455 this is absolutely peanuts, but still, it's unrelated to that ticket so here we are:

{{{
In [1]: class A:
   ...:     def __init__(self, **kwargs):
   ...:         self.__dict__.update(**kwargs)
In [2]: a = A(a=1, b=2, c=3, d=4)
In [3]: empty = {}
In [4]: full = {'d': 3, 'c': 1}
In [5]: %timeit a.__dict__.update(empty)
91.2 ns ± 1.25 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: %timeit a.__dict__.update(full)
133 ns ± 0.92 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
}}}
If we just ''check'' the dictionary isn't empty (in `_chain` that's `kwargs`):
{{{
In [7]: %timeit if empty: a.__dict__.update(empty)
19 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [8]: %timeit if full: a.__dict__.update(full)
156 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
}}}

You can see the paying the basic cost of an if (for me, that appears to be `20ns` for an empty dict, and more like `6ns` for an interned/singleton value like `1` or `True` or `''` as a baseline) is worthwhile, and having pickling a Prefetch be `20ns` slower isn't too bad, given it's not exactly the most common use case...

The difference is small but can be made more obvious by checking the cProfile timings over many (''many'') iterations; here's the before across 100 users each with a group and permission:
{{{
In [2]: %prun -stime -l_chain for _ in range(1000): tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
48095003 function calls (46202003 primitive calls) in 34.811 seconds

Ordered by: internal time
List reduced from 384 to 1 due to restriction <'_chain'>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
312000    4.070    0.000    8.791    0.000 query.py:1325(_chain)
In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
18.3 ms ± 391 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
}}}
And after applying the change as follows:
{{{
def _chain(...):
    ...
    if kwargs:
        obj.__dict__.update(kwargs)
    ...
}}}
we end up with:
{{{
In [2]: %prun -stime -l_chain for _ in range(1000): tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
47779003 function calls (45886003 primitive calls) in 33.220 seconds

Ordered by: internal time
List reduced from 384 to 1 due to restriction <'_chain'>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
312000    0.267    0.000    5.644    0.000 query.py:1325(_chain)
In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
18.1 ms ± 289 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)
}}}

I have to assume that the overall timing differences measured by cProfile and timeit are both just standard fluctuations, despite cProfile's `tottime` reporting there. 

Branch/PR to follow. It won't be a shock if the diff is 3 lines long, given the change is shown above.
"	Cleanup/optimization	closed	Database layer (models, ORM)	dev	Normal	fixed			Ready for checkin	1	0	0	0	0	0