Context Navigation

← Previous Ticket
Next Ticket →

#32940 closed Cleanup/optimization (fixed)

Consider removing unused/unnecessary functionality in tree.Node.add

Reported by:	Keryn Knight	Owned by:	Keryn Knight
Component:	Utilities	Version:	dev
Severity:	Normal	Keywords:
Cc:		Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description (last modified by Nick Pope)

I'm proposing for discussion the removal of 2 things within the Node.add method, or at least further investigation on why it's not possible to do so; those 2 bits are outlined immediately below.

If we instrument the Node.add method, used by query_utils.Q and where.WhereNode extensively, like so:

...
if data in self.children:
    print('v- data existed')
    traceback.print_stack()
if self.connector == conn_type and data in self.children:
    print('^- data existed and connector matched')
    return data
if not squash:
    print('<- squash is not Truthy')
    self.children.append(data)
    return data
...

and run the test suite (or as much as I'm able easily):

...
Ran 14834 tests in 453.388s
OK (skipped=1188, expected failures=4)

We will find that the data in self.children barely registers as tested/used, and squash is entirely unused (unless those 1188 tests hide the truth).

Having separately instrumented the add method previously to check the number of calls by class type, it appears as below:

1493 calls to Q.add of which 0 have a data in children match.
2 calls to Node.add of which 1 is a data in children match.
210862 calls to WhereNode.add of which 2 are a data in children match.

But of all of those matches, the connector is never the same, so it never does that return early expected optimisation.

The tests which exercise those 3 matches are:

tests.queries.tests.DisjunctiveFilterTests.test_ticket8283
tests.queries.tests.Queries4Tests.test_combine_or_filter_reuse
tests.utils_tests.test_tree.NodeTests.test_add_eq_child_mixed_connector

The last of those is intentionally making sure that it does get added due to the connector difference.
The first two, as they could affect where clause pushdown & merging, or cause different query plans due to repeats, we should check to make sure that the queries (additionally to the results) remain the same before and after by doing the following back of the napkin job for each of them:

self.assertEqual(
    str(x.query),
    "???"
)

and confirm that they remain the same (they do), which are:

'SELECT "queries_extrainfo"."id", "queries_extrainfo"."info", "queries_extrainfo"."note_id", "queries_extrainfo"."value", "queries_extrainfo"."date_id", "queries_extrainfo"."filterable" FROM "queries_extrainfo" WHERE (("queries_extrainfo"."note_id" = 1 OR "queries_extrainfo"."info" = e2) AND "queries_extrainfo"."note_id" = 1) ORDER BY "queries_extrainfo"."info" ASC';

'SELECT "queries_extrainfo"."id", "queries_extrainfo"."info", "queries_extrainfo"."note_id", "queries_extrainfo"."value", "queries_extrainfo"."date_id", "queries_extrainfo"."filterable" FROM "queries_extrainfo" WHERE (("queries_extrainfo"."info" = e2 OR "queries_extrainfo"."note_id" = 1) AND "queries_extrainfo"."note_id" = 1) ORDER BY "queries_extrainfo"."info" ASC';

'SELECT "queries_author"."id", "queries_author"."name", "queries_author"."num", "queries_author"."extra_id" FROM "queries_author" WHERE (("queries_author"."name" = a1 OR "queries_author"."name" = a3) AND "queries_author"."name" = a1)';

Removing the data in self.children check should be a small optimisation win because it's an O(n) scan across the entire list of direct children, and uses the Node.__eq__ method which does an equality scan on it's own children (so grandchildren).

In [2]: %%timeit
   ...: x = Q(('a', 1), ('b', 2), c=3, d=4, b=1)
   ...: y = Q(aa=1, bb=2, cc=3)
   ...: x.add(y, Q.OR)
4.81 µs ± 16.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

after removing:

4.66 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

It's worth noting that the concept of checking data in self.children is semi-flawed anyway, because Q(('a', 1), a=2) or Q(('a', 1), ('a', 2)) very simply allow duplicate entries to occur immediately and up-front.

I have a patch which I'll push up shortly once I've tidied it of all my instrumentation and pdb usage. The hope is that the entire CI passes, if it doesn't this is all moot :)

Change History (5)

comment:1 by Keryn Knight, 5 years ago

Owner:	changed from nobody to Keryn Knight
Status:	new → assigned

Draft PR is https://github.com/django/django/pull/14658
Let's see what it says about whether or not this could even move to accepted.

comment:2 by Nick Pope, 5 years ago

Description:	modified (diff)
Triage Stage:	Unreviewed → Accepted

The squash argument was added in d3f00bd5706b35961390d3814dd7e322ead3a9a3. It seems to have been unused since it was introduced. (I didn't find squash=False anywhere.)

The if self.connector == conn_type and data in self.children: line had the first half of the expression added recently in b81c7562fc33f50166d5120138d6398dc42b13c3. Interestingly this was actually fixing a regression in d3f00bd5706b35961390d3814dd7e322ead3a9a3 as prior to that commit the condition was the same!

It'd probably be a good idea to see if Simon can cast an eye over this.

Version 0, edited 5 years ago by Nick Pope (next)

comment:3 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

In fb35e0a2:

Refs #32940 -- Removed Node.add()'s unused squash parameter.

Unused since its introduction in d3f00bd5706b35961390d3814dd7e322ead3a9a3.

Co-authored-by: Nick Pope <nick@…>

comment:4 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

In ff661db:

Refs #32940 -- Removed unnecessary branch in Node.add().

The "data in self.children" branch was causing data.eq to be
called for each entries in "self.children" which resulted in a huge
slowdown during queryset construction.

It's purpose was to prevent queries of the form

Model.objects.filter(foo='bar').filter(foo='bar')

from resulting in

WHERE foo='bar' AND foo='bar'

but it's not covered by the suite and has arguable performance benefits
since it's not very common and SQL engines are usually very good at
folding/optimizing these.

See also #32632 for prior discussion around comparing data to the
Node's children.

Co-authored-by: Nick Pope <nick@…>

comment:5 by Mariusz Felisiak, 5 years ago

Resolution:	→ fixed
Status:	assigned → closed

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#32940 closed Cleanup/optimization (fixed)

Consider removing unused/unnecessary functionality in tree.Node.add

Description (last modified by Nick Pope)

Change History (5)

comment:1 by Keryn Knight, 5 years ago

comment:2 by Nick Pope, 5 years ago

comment:3 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

comment:4 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

comment:5 by Mariusz Felisiak, 5 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us