id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
17000	sql/query.py add_q refactoring	Anssi Kääriäinen	nobody	"I have worked on refactoring sql/query.py add_q. There are three reasons for doing this:
  1. It was hard to see what was happening in add_q previously. I hope I have improved this situation.
  2. The responsibilities in the code were not well divided. add_filter for example did some work that add_q normally does, but only in case when the filter was to the having clause.
  3. The add_q resulted in inefficient where node structure.

It turned out that the django/utils/tree.py tree structure was hard to use. So that is also refactored. Finally where.as_sql got some refactoring.

The refactoring already resulted in some other cleanups around the code, for example sql/query.py combine() is now cleaner in its hadling of the where clause.

The first two changes are best explained by reading the patch. However I can give some examples of why no. 3 is important. First consider query M.objects.exclude(pk=1). Previously this would have generated the following where tree:
{{{
AND:
  AND:
    NOT AND:
      AND:
        pk=1
}}}
And now:
{{{
AND:
   NOT AND:
     pk=1
}}}

The query M.objects.filter(pk=1).filter(pk=2).filter(pk=3) would generate:
{{{
AND: 
   AND:
     pk=1 
   AND:
     pk=2
   AND:
     pk=3
}}}
And now:
{{{
AND:
  pk=1
  pk=2
  pk=3
}}}
Finally, the query User.objects.filter(~Q(pk=1)&~Q(Q(pk=3)|Q(pk=2))) did create:
{{{
AND: 
  AND: 
    AND: 
      NOT AND: 
        AND:
          pk=1 
  AND:
    NOT
      AND:
        OR:
          AND:
            pk=3
        OR:
          pk=2
}}}
And now:
{{{
AND:
  NOT AND:
    pk=1
  NOT AND:
    OR:
      pk=3
      pk=2
}}}

Some performance numbers, two tests:
{{{
#Test 1
for i in range(0, 1000):
    qs = M.objects.filter(pk=1)
# Test2
qs = M.objects.all()
for i in range(0, 200):
    qs = qs.filter(pk=1)
}}}

Results before:
0:00:00.326911
0:00:02.265825
Results after:
0:00:00.193199
0:00:00.086636

So, about 35% faster and about 25x faster. The patch includes deepcopy removal for qs.clone(), and large part of the performance gain is from there.

All tests passed, though I had to fix one test.

There would be still further work, but I wanted to keep this at least somewhat restricted. Even more restricted is hard to do, as the tree and where node restructuring would mean refactoring the add_q anyways.

This ticket is related to #16759, but as this goes deeper in scope, I have created another ticket for this."	Cleanup/optimization	closed	Database layer (models, ORM)	1.3	Normal	fixed		suor.web@… anssi.kaariainen@…	Accepted	1	0	0	1	0	0