Opened 4 months ago

Closed 4 months ago

#33003 closed Cleanup/optimization (fixed)

Micro-optimisation for QuerySet._chain

Reported by: Keryn Knight Owned by: Keryn Knight
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Whilst working on #28455 I noticed that _chain accepts kwargs, but that appears to only be to facilitate pickling, with Prefetch.__getstate__ making use of the functionality. Thus most of the time, kwargs is empty and there's no reason to touch/update the __dict__. And it's faster not to do so. Note that in comparison with the benefits of #28455 this is absolutely peanuts, but still, it's unrelated to that ticket so here we are:

In [1]: class A:
   ...:     def __init__(self, **kwargs):
   ...:         self.__dict__.update(**kwargs)
In [2]: a = A(a=1, b=2, c=3, d=4)
In [3]: empty = {}
In [4]: full = {'d': 3, 'c': 1}
In [5]: %timeit a.__dict__.update(empty)
91.2 ns ± 1.25 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: %timeit a.__dict__.update(full)
133 ns ± 0.92 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

If we just check the dictionary isn't empty (in _chain that's kwargs):

In [7]: %timeit if empty: a.__dict__.update(empty)
19 ns ± 0.377 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [8]: %timeit if full: a.__dict__.update(full)
156 ns ± 1.33 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

You can see the paying the basic cost of an if (for me, that appears to be 20ns for an empty dict, and more like 6ns for an interned/singleton value like 1 or True or '' as a baseline) is worthwhile, and having pickling a Prefetch be 20ns slower isn't too bad, given it's not exactly the most common use case...

The difference is small but can be made more obvious by checking the cProfile timings over many (many) iterations; here's the before across 100 users each with a group and permission:

In [2]: %prun -stime -l_chain for _ in range(1000): tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
48095003 function calls (46202003 primitive calls) in 34.811 seconds

Ordered by: internal time
List reduced from 384 to 1 due to restriction <'_chain'>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
312000    4.070    0.000    8.791    0.000 query.py:1325(_chain)
In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
18.3 ms ± 391 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)

And after applying the change as follows:

def _chain(...):
    ...
    if kwargs:
        obj.__dict__.update(kwargs)
    ...

we end up with:

In [2]: %prun -stime -l_chain for _ in range(1000): tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
47779003 function calls (45886003 primitive calls) in 33.220 seconds

Ordered by: internal time
List reduced from 384 to 1 due to restriction <'_chain'>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
312000    0.267    0.000    5.644    0.000 query.py:1325(_chain)
In [3]: %timeit -n1000 -r10 tuple(User.objects.prefetch_related('groups', 'user_permissions', 'groups__permissions'))
18.1 ms ± 289 µs per loop (mean ± std. dev. of 10 runs, 100 loops each)

I have to assume that the overall timing differences measured by cProfile and timeit are both just standard fluctuations, despite cProfile's tottime reporting there.

Branch/PR to follow. It won't be a shock if the diff is 3 lines long, given the change is shown above.

Change History (4)

comment:1 Changed 4 months ago by Keryn Knight

Has patch: set

PR is https://github.com/django/django/pull/14755. CI is still running atm. Probably makes sense for someone to check the timings are independently repeatable given the timescales in question (ns). Python version used was 3.9.5.

comment:2 Changed 4 months ago by Mariusz Felisiak

Triage Stage: UnreviewedAccepted

comment:3 Changed 4 months ago by Mariusz Felisiak

Triage Stage: AcceptedReady for checkin

comment:4 Changed 4 months ago by GitHub <noreply@…>

Resolution: fixed
Status: assignedclosed

In 921e4cc:

Fixed #33003 -- Removed kwargs from QuerySet._chain().

The functionality of updating the dict was only present to allow
for pickling a Prefetch object, which is a comparably rare operation.
Forcing the getstate() implementation to handle it allows the
chaining operation to be slightly faster, which is much more common.

Note: See TracTickets for help on using tickets.
Back to Top