Opened 3 years ago
Closed 2 years ago
#32948 closed Cleanup/optimization (fixed)
Optimise Q combination and inversion
Reported by: | Keryn Knight | Owned by: | Nick Pope |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This is corollary to #32946 and #32940.
Q is currently inconsistent with it's friends WhereNode
and Node
in that it doesn't use the _new_instance
trick. Even using the _new_instance
trick leaves some performance on the table vs just inlining the __class__
switch, because it's an extra method call which affects both _combine()
and __invert__()
.
The _combine
method also has conditionals for what to do about an empty node being combined, either lhs or rhs. One side uses deconstruct
, the other uses the shallow copy protocol (only since c8b659430556dca0b2fe27cf2ea0f8290dbafecd), which is unimplemented.
If __copy__
is not implemented, it ultimately falls back (after some branching checks) to the builtin __reduce_ex__(4)
+ copy._reconstruct
which gives:
In [3]: x = Q() In [4]: %timeit copy.copy(x) 2.2 µs ± 70.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [5]: %timeit copy.copy(Q()) 3.52 µs ± 264 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If we implement the necessary method like so:
def __copy__(self): obj = self._new_instance() obj.__dict__ = self.__dict__.copy() return obj
we can reduce those numbers to:
In [3]: x = Q() In [4]: %timeit copy.copy(x) 1.27 µs ± 6.19 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [5]: %timeit copy.copy(Q()) 2.37 µs ± 28.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
we can then reduce the work further by not invoking copy.copy()
at all, by setting the copy = __copy__
attribute on the Q class.
From there, we can avoid calling self.deconstruct()
at all, instead calling self.copy()
knowing that self
has values, but other
does not. Both are basically on-par with eachother speedwise, with deconstruct being faster on empty nodes (which self
isn't) and copy being minimally faster when there's a different connector (eg: OR
).
For inverting, we can similarly change it to avoid the Node.add()
call:
def __invert__(self): obj = self.copy() obj.negate() return obj
which would allow it to go from:
In [2]: %timeit ~Q() 2.89 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [3]: %timeit ~Q(a=1, b=2, c=3, d=4) 3.77 µs ± 57.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
to:
In [2]: %timeit ~Q() 2.34 µs ± 9.49 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [3]: %timeit ~Q(a=1, b=2, c=3, d=4) 3.14 µs ± 72.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In totality, then, baselines:
In [2]: full = Q(a=1, b=2, c=3) In [3]: full2 = Q(d=4, e=5, f=6) In [4]: empty = Q() In [5]: %timeit full & full2 2.65 µs ± 17.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [6]: %timeit full | full2 3 µs ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: %timeit ~(full | full2) 5.09 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [8]: %timeit ~(full & full2) 4.67 µs ± 58.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [9]: %timeit empty & full 2.81 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [10]: %timeit full & empty 3.16 µs ± 43.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [11]: %timeit empty | full 2.8 µs ± 22.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [12]: %timeit full | empty 3.13 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [13]: values = (Q(a=1), Q(b=2), Q(c=3), Q(d=4), Q(e__in=[1,2,3,4])) In [14]: %timeit reduce(or_, values) 12 µs ± 212 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
and after the changes:
In [5]: %timeit full & full2 2.11 µs ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [6]: %timeit full | full2 2.39 µs ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [7]: %timeit ~(full | full2) 3.62 µs ± 47.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [8]: %timeit ~(full & full2) 3.34 µs ± 28.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [9]: %timeit empty & full 1.57 µs ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [10]: %timeit full & empty 1.68 µs ± 18.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [11]: %timeit empty | full 1.66 µs ± 24.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [12]: %timeit full | empty 1.8 µs ± 23 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each) In [13]: values = (Q(a=1), Q(b=2), Q(c=3), Q(d=4), Q(e__in=[1,2,3,4])) In [14]: %timeit reduce(or_, values) 9.59 µs ± 56.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
A final note then, if inlining the _new_instance
code into the copy method were done, it looks like it shaves off 100-120ns per copy, from my quick tests. So there is still performance on the table to put it fully ahead of deconstruct (and another possibility exists - that the deconstruction for migrations in some way gets more complex, negatively affecting runtime performance for self copies)
I have a patch series which will need a little tidying (don't they always?) but should pass CI when I open the PR for discussion.
Change History (15)
comment:1 by , 3 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 3 years ago
+1 to privilege readability over micro-optimization, if such a choice has to be made.
comment:3 by , 3 years ago
Has patch: | set |
---|
I don't think readability has been hurt by the change; I could argue that it's been improved even, perhaps. Do judge for yourselves, naturally.
At this point there's 2 PRs to look at:
my original one: https://github.com/django/django/pull/14673
Nick P's one based on mine above: https://github.com/django/django/pull/14677
Please note that the discussion about Nick's variant happened in-line in my PR, as regrettably I didn't see the other PR until after we'd started discussing the changes.
comment:4 by , 3 years ago
Patch needs improvement: | set |
---|
Marking as "needs improvement" because we need to reach a consensus before the next review.
comment:5 by , 3 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
comment:6 by , 3 years ago
Needs tests: | set |
---|
From the discussion it seems that extra test coverage is needed.
comment:7 by , 2 years ago
Needs tests: | unset |
---|
Unflagging "needs tests" as some have been added to cover the various approaches to copying/cloning which should provide confidence that the behaviour hasn't changed.
comment:9 by , 2 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:15 by , 2 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Tentatively accepted, readability with the proposed patch is crucial to me.