Code

Opened 3 years ago

Closed 16 months ago

#16759 closed Bug (fixed)

Expensive sql.Query cloning

Reported by: Suor Owned by: Suor
Component: Database layer (models, ORM) Version: 1.3
Severity: Normal Keywords: orm performance cloning
Cc: anssi.kaariainen@…, akaariai, hv@…, niwi@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I dug into Query.clone() and found that deepcopying of it's where tree may sometimes lead to deepcopying of model fields and _meta. And with a help of related fields it sometimes becomes a disaster.

I dug further and found that actual culprits are db.models.sql.where.Constraint
and db.models.sql.expressions.SQLEvaluator which "leak" deepcopying to ORM objects.

We fixed this problem for ourselves and here is the patch.

Attachments (10)

query_carefull_clone.patch (1.5 KB) - added by Suor 3 years ago.
16759.diff (3.3 KB) - added by akaariai 3 years ago.
16759_where_clone.diff (3.5 KB) - added by akaariai 3 years ago.
A different approach
16759_cleaned_up_where_clone.diff (3.4 KB) - added by Suor 3 years ago.
cleaned up diffrent approach
16759_cleaned_up_where_clone_for_1.3.1.diff (3.8 KB) - added by snyderra@… 2 years ago.
patch that can be applied to the 1.3.1 stable release
16759_where_clone_2.diff (5.9 KB) - added by akaariai 2 years ago.
Latest cleanup of wherenode cloning
16759_where_clone_2.2.diff (5.7 KB) - added by akaariai 2 years ago.
.clone() from utils/tree.py to sql/where.py
16759_test.diff (1.4 KB) - added by akaariai 2 years ago.
16759_test.2.diff (2.7 KB) - added by lrekucki 2 years ago.
Regression tests without Query internals introspection.
#16759-remove_deepcopy_in_qs.diff (9.4 KB) - added by Kronuz 14 months ago.
Fixes #19964

Download all attachments as: .zip

Change History (46)

Changed 3 years ago by Suor

comment:1 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Ready for checkin

Confirmed this by running:

qs = MyModel.objects.filter(pk=F('pk'))
memo = {}
query = qs.query.clone(memo=memo)
print len(memo)

Before patch the length was 175 in my setup, after patch 34. The memo contains all sorts of garbage from the MyModel._meta. The speed of running 100 qs.filter(pk=1) in loop dropped to half.

The only question I have if this should have tests. It is easy to test that the memo does not contain anything from the Options, but that test seems to go very much into the internals of sql/query.py.

In my opinion the approach taken in the patch is valid. All current tests passed on SQLite3. Marking as ready for checkin.

comment:2 Changed 3 years ago by Alex

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

The patch as written is not correct. It isn't correct for deepcopy to return a shallow copy. Either the object itself is immutable, in which case it should simply return self, or all of its components are immutable, in which case they should have __deepcopy__ methods which simply return self. After discussing with Jacob, our conclusion is that the right fix is to make django.db.model.options.Options and django.db.model.fields.Field immutable (via changing their __deepcopy__).

comment:3 Changed 3 years ago by akaariai

  • Patch needs improvement unset

Sounds like a much cleaner approach.

There is one problem however. django.db.model.fields.Field does already define a __deepcopy__ method. It is used in two places in django/db/models/base.py:__new__() when inheriting fields:

# line 168
for field in parent_fields:
    new_class.add_to_class(field.name, copy.deepcopy(field))
# The other one is similar, at line 189 when inheriting virtual fields.

That deepcopy does a half-shallow copy of the field. There seems to be no other intentional uses of Field.__deepcopy__. It is possible to redefine Field.__deepcopy__ to return self, and rename the current deepcopy to something else (clone?) and use that in the above snippet. A patch implementing this approach is attached. There is some weirdness, because GenericForeignKey isn't a subclass of Field, and RelatedField isn't a subclass either, except it seems to somehow be a subclass (if I define a clone which would return a deepcopy, the Field.__deepcopy__ is called). See lines 87-88 of django/db/models/fields/related.py. A better comment there would be useful :)

There are also some other places that use shallow-copying deepcopy. For example django.forms.Field, django.forms.Widget and django.db.models.sql.Query. I guess these should be fixed too, but that is another ticket's problem.

A completely different approach is to define .clone() for WhereNode. The new method can do the minimal amount of copying needed.

How about the testing aspect? Does this need tests or is it enough to not break the current ones? If this does need tests, then what to test? The attached patch does not have tests. It passes all current tests. I did some quick benchmarking, and this seems to be 2-3x faster for the simple case of qs.filter(pk=1) 1000 times in a loop.

I hope I am helping more than distracting :)

Changed 3 years ago by akaariai

comment:4 Changed 3 years ago by akaariai

Just a quick note about the django/db/models/fields/related.py subclassing: the reason the Field.__deepcopy__ is called is because of multi-inheritance, for example ForeignKey(RelatedField, Field)... So nothing too strange about that. The comment on line 87 is still a bit confusing, though :)

comment:5 Changed 3 years ago by Suor

Should it be made ready for checking?

comment:6 Changed 3 years ago by anonymous

  • Triage Stage changed from Accepted to Ready for checkin

comment:7 Changed 3 years ago by akaariai

I have a completely different approach to this. The idea is simply to define .clone() method for the where class (actually for utils/tree.py - Node).

My test cases are:

# Case 1
qs = User.objects.all()
for i in range(0, 1000):
    qs = qs._clone()
# Case 2
qs = User.objects.filter(pk=1).filter(pk=2).filter(pk=3)
for i in range(0, 1000):
    qs = qs._clone()

The results are quite encouraging. For test case 1 the timing went from 0.163 -> 0.127s (about 25% speedup), and for test case 2 the timing went from 0.501 -> 0.162, or more than 3x faster.

Patch attached, passes all tests. There is some attempt at making this backwards-compatible in case the qs.query.where doesn't support clone().

Changed 3 years ago by akaariai

A different approach

comment:8 Changed 3 years ago by anonymous

Could be a right approach but for slightly different problem. We are trying to fix bug here, you are making cleanup. Though your cleanup fixes a problem too.

Regarding patch, as I understand django backward compatibility policy, internal things like sql.Query or tree.Node are not guaranteed to be backward compatible, so you should not complicate your code with catching AttributeErrors and fallbacking to deepcopy. Just take your idea of replacing deepcopy with .clone() to its logical end.

comment:9 Changed 3 years ago by Suor

We also can make Constraint immutable that will simplify Node.clone() to:

def clone(self):
    clone = self.__class__._new_instance(
        children=[], connector=self.connector, negated=self.negated)
    for child in self.children:
        if isinstance(child, tuple):
            clone.children.append(child)
        else:
            clone.children.append(child.clone())
    for parent in self.subtree_parents:
        clone.subtree_parents.append(parent.clone())
    return clone

comment:10 Changed 3 years ago by akaariai

Constraint has .relabel_aliases, and thus is not immutable. It could of course be made immutable by extensive changes to the code (relabel only at as_sql time), but I don't think that is what is wanted in this ticket.

It would be good if somebody with real queryset cloning performance problems could try the alternate patch. I have a feeling it could be surprisingly big win in some cases.

comment:11 Changed 3 years ago by Suor

I had real queryset cloning performance problems. I've solved that with mutating querysets instead of cloning (there is no faster code than no code).
Still I'll try your patch compared to mine in a few days. And comparing to vanilla django is already a big win.

comment:12 Changed 3 years ago by furagu

I work on the same project with Suor. I have compared that alternate patch to the original "carefull cloning" on our extremely deepcopied code parts that caused original research and figured out a speedup of about 10 milliseconds. We decided to use the alternate version.

comment:13 Changed 3 years ago by akaariai

Sorry, the 10 milliseconds alone doesn't tell that much. What was the original time without any patches, and what is the time with alternate/carefull clone. Is there anything else in the situation you tested than queryset construction? I mean is the time what is taken for constructing a response in total or just one queryset?

comment:14 Changed 3 years ago by furagu

Actually there are two places of deepcopy hazard in our code.
I have profiled the whole pages with response construction, template rendering, etc.

The first place is the most visited page of the site. Deepcopy does little slowdown here, but we are concerned because this view takes about a half of the total CPU usage.
For this view the original average time without any patches was 0.181 sec. With carefull clone patch it takes about 0.160 sec, and the alternate version takes about 0.149.

The second place has not so many visitors, but the deepcopy does really much destruction there. It takes about 0.300 sec without any patches, 0.069 with carefull clone and 0.062 sec with alternate version.
There are some tricky filters which lead to a "deepcopy leak" that causes about 20000 calls of copy.deepcopy. With alternate patch it has only 210 calls of copy.deepcopy.

comment:15 Changed 3 years ago by akaariai

Clarification: the alternate patch refers to 16759_where_clone.diff, not 16759.diff, right?

Ok, so it seems that the performance improvement from the alternate patch isn't that great compared to the carefull_clone version. But there are some massive improvements using either the deepcopy modification, or the alternate (clone) approach. As the deepcopy patch is much less controversial I would think that is the way to go. Even though the clone approach is a bit cleaner IMHO.

I would like to know two things still if possible:

  1. Can you fiqure out how much wherenode deepcopying takes of the total time?
  2. Can you check if the queryset's where looks somewhat sane, the reason I ask is this
    print User.objects.exclude(Q(email=1)).query.where
    (AND: (AND: (NOT (AND: (AND: (AND: (<django.db.models.sql.where.Constraint object at 0x9e775ec>, 'exact', True, u'1')))))))
    
    That seems to be a little verbose for a query which is WHERE NOT email = 1. Every parenthesis represents an object that needs to be cloned.

The perfect fix for this would be to use reverse-polish notation internally for the where/having conditions. Cloning would only be needed when doing relabel_aliases, which is cheap. Otherwise qs.clone could use clone.where = self.where[:] which would be darned cheap. This is however a freaking large project to do correctly. I don't know if this is even needed after the deepcopy fix, though.

The ready for checking applies to the 16759.diff, not to the where_clone patch.

comment:16 Changed 3 years ago by Suor

furagu was not clear enough about patch we used. It was cleaned up alternate version, which I attach.

Also that 10 millisecinds was quite considerable for us (and probably for someone else will be too). Combined with a cleaner approach of explicit cloning over using deepcopy I think alternate version should be preffered over original quite quick and dirty patch.

Changed 3 years ago by Suor

cleaned up diffrent approach

comment:17 follow-up: Changed 3 years ago by akaariai

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

A more thorough approach in #17000. If Suor or furagu are listening, please try out the patch in #17000. I am 99.5% sure you will get better results. Although I must say that guessing performance improvements without measurements go wrong 99.5% of the time.

The latest patch in this ticket has two failings:

  1. I don't believe the where node's .clone manages to keep the parent/child links in sync.
  2. Aggregate has relabel_aliases that does actually do something, and thus returning self from .clone() is wrong.

Marking as patch needs improvement.

comment:18 in reply to: ↑ 17 Changed 3 years ago by furagu

I'll try the patch from #17000 in a few days. It seems to be promising.

Changed 2 years ago by snyderra@…

patch that can be applied to the 1.3.1 stable release

comment:19 Changed 2 years ago by snyderra@…

I applied the patch from ticket 16759 to the appropriate revision. did a diff to the 1.3.1 download and merged the changes into the 1.3.1. resulted in a 30% speed increase during a bulk data insert that is also intensive in lookups for processing of the data. the speed increase is purely due to the reduction of time spent in deepcopy. that method alone was using ~40% of the compute time according to cProfile. after the patch it was reduced to 12%, so almost 30%. Now the bottle neck is database access as I would expect and not deepcopy.

comment:20 Changed 2 years ago by Alex

The clone() on Aggregate doesn't seem correct to me, you can see in the relable_alias method that it can be mutated.

comment:21 Changed 2 years ago by akaariai

  • Cc akaariai added

Yeah, that clone is broken. One should create a new instance in the clone, and return that.

In addition, there might be some problems in the tree cloning. If I am not mistaken, the tree is double linked (there is a link to children, and children has link to parent). And the clone doesn't guarantee that traveling a link downwards, then upwards ends up in the same node. However, if the above were true, there should be an endless loop (clone parent, clone its children -> they again clone the parent and so on). Too long time since I wrote that patch, so I don't remember how it works. What I remember is that there was something weird with the tree implementation, but that doesn't help much :)

Anyways, if reviewing this ticket, it would be good to check that traveling from a tree root to some leaf node and then back again ends up in the same (as in id()) root node. Unfortunately I don't have time to do anything more right now...

comment:22 Changed 2 years ago by akaariai

And now I remember what the strangeness was. Trees are created by starting subtrees. When you start a subtree for tree node A, you modify the tree inplace, such that references to A now point to the new subtree root, and that subtree root has A in subtree_parents.

That API is really strange. You basically have some sort of global state for the tree. You can not store a reference to a tree's root node. If somebody, somewhere calls start_subtree() on the root node, you suddenly have a reference to a new empty subtree of that root. The root node was mutated inplace. The tree API is changed to something simpler and more conventional in #17025. But changing the tree representation should not be done for 1.4.

In short, the subtree_parents cloning in tree.clone() should be changed to assert not self.subtree_parents. When cloning you must start from the real tree root, and in that case no node in the tree should have subtree_parents.

comment:23 Changed 2 years ago by akaariai

  • Patch needs improvement unset

I cleaned up the patch, I think it is good to go. Aggregates are handled by copy.copy now, and there is an assert that subtree_parents is empty when cloning. I removed all deepcopy references from sql/query.py

There is one bad abstraction leak: tree.clone() must know the internal structure of queryset's lookup condition tuples. The fix would be to create a class for these tuples also. This would be a quite big change, but I think it would make the code much cleaner in sql/query.py and sql/where.py, too.

The speed is from 30% to 60% faster for three simple test-cases, query construction only (not compiling it to sql, or executing the sql):

Test 1, 1000x TestModel.objects.all().filter(pk=1)
Test 2, 1000x TestModel.objects.all().filter(pk=1).filter(pk=1)
Test 3, 1000x TestModel.objects.all().filter(pk=1).filter(pk=1).filter(pk=1)

I would expect 70%+ speedup is possible for complex realworld queries.

All tests passed on sqlite3.

I think there is still much room for improvement, but this is the low-hanging fruit. I do think this would be a good candidate for 1.4. I have seen reports of 200+ms time used for queryset construction for single page load. So the problem is real.

Changed 2 years ago by akaariai

Latest cleanup of wherenode cloning

Changed 2 years ago by akaariai

.clone() from utils/tree.py to sql/where.py

comment:24 Changed 2 years ago by akaariai

I removed the above mentioned ugliness by moving .clone() from utils/tree.py() to db/models/sql/where.py. There is no abstraction leak any more.

comment:25 Changed 2 years ago by guettli

  • Cc hv@… added

comment:26 Changed 2 years ago by lrekucki

  • Needs tests set

I love speed improvements, but lets focus on the topic of this ticket: "deepcopying of it's where tree may sometimes lead to deepcopying of model fields and _meta". This should just never happen, so we need regression tests for cases where this currently happens.

comment:27 Changed 2 years ago by akaariai

I think the basic problem here is using deepcopy. Deepcopy is actually doing exactly what it's supposed to do, and testing that deepcopy doesn't do a deepcopy isn't correct.

If the .clone() based approach would be taken, then testing that it doesn't clone model._meta could be done. But there isn't much point to that. It won't suddenly begin cloning things it is not supposed to clone. In my opinion it is enough that we have Django's test suite checking that the code is doing what it is supposed to do, and then benchmarks checking it is done fast enough.

In addition checking nothing more than the necessary objects were cloned in qs.clone() is really hard. Here the problem is that the set of objects it should not clone is a hopelessly large set.

comment:28 Changed 2 years ago by akaariai

  • Needs tests unset

I have attached a test-only patch for this. I think it really shows the problem of testing this, as it goes very much into internals of the queryset implementation, and it tests a very specific problem - in this case a problem with SQLEvaluator's deepcopy.

Another way to test this is using the memo kwarg of qs.query.clone(), but that makes the assumption the memo dict is really passed around, which is not done in .clone() based approach.

The more I think about this the more I am of the opinion that using restricted deepcopy instead of .clone() is just plain wrong. The idea of deepcopy is to do a deep copy. Currently it does not do that, and fixing this ticket by restricting it more is not the correct approach.

The problem with .clone() based approach is that it is easy to miss things you should clone, while with the deepcopy based approach you have to artificially restrict the deepcopy so that it fits the queryset.clone() use case. Pick your poison :)

In the current .clone() based approach the somevalue of qs.filter(pk__in=someval) is not cloned, while previously it was deepcopied. The effect of this is that if the someval is changed (for example if it is a list and you append into it) the change will now be visible to cloned objects. However, the correct fix for this is cloning the value when first given to the queryset, the reason is this:

lst = [1]
qs = qs.filter(pk__in=lst)
lst.append(2) # this will be visible in the above qs
qs = qs.all()
lst.append(3) # this will not be visible - the queryset cloning above made a copy of the list, too.

so the current implementation is a bit inconsistent. The list should be copied already when given to the queryset in the .filter() call.

I also did a speed test of this, with a really problematic case (that is, using F-objects...). The test case is simple:

10000x
SomeModel.objects.filter(pk=F('pk')+1).all()

The idea is just to do a clone after using an F object. Results with the .clone() based approach is 2s vs 11s using trunk. So, fixing this one way or another is in my opinion really important.

Changed 2 years ago by akaariai

Changed 2 years ago by lrekucki

Regression tests without Query internals introspection.

comment:29 Changed 2 years ago by lrekucki

Here's my approach for testing this. I just mocked __deepcopy__ in the models we want to make sure that are not copied. I didn't debug this deep enough to come up with more tests, so if you have any, please tell me :)

I'm +1 on doing clone() instead of deep copying and really like the patch, just want to make sure this bug doesn't reappear again. Not sure if we should copy the filter arguments (maybe just advice using immutable types in the docs). If we do copy, I think we should try copying to a immutable type. Then it's safe to share that value between QuerySet copies.

comment:30 Changed 2 years ago by anonymous

I am -1 on copying arguments on usage. It's useless work mostly always.
+1 for documenting it and advising against mutating values after using them in queryset.

comment:31 Changed 2 years ago by akaariai

I agree on immutable-by-convention approach to the values in the queryset. The only other approach is deepcopy, and we know that can cause some problems. A common problematic case would be list of models, where the deepcopy would do exactly what it does now - deepcopy the model meta.

As for testing this, the approach given by lrekucki seems very good. Lets use that. One more test case could be the arguments deepcopying. If I am not mistaken, if you give the qs a list of models, the deepcopy will go "too deep".

comment:32 Changed 23 months ago by akaariai

A quick overview why I think we must get rid of the deepcopy and use clone based approach instead:

  1. It is hard to restrict deepcopy to absolute minimal amount of changes while maintaining the property that the deepcopy actually does a proper copy of the copied object, especially if we want to either do a full deepcopy or return self.
  2. deepcopy is expensive, even if the __deepcopy__ was defined exactly like .clone() for each object.
  3. The current usage of deepcopy in for example sql/query.py doesn't actually do a full clone of the object. Instead, the query.py deepcopy does a clone, and the clone isn't actually a full clone of the original object.

I must point out once again that the speed gains of clone based approach are significant. It is possible that using .clone() instead of deepcopy will introduce new bugs, but in the long run the clone based approach is the right one to use.

comment:33 Changed 23 months ago by akaariai

An up-to-date patch can be found from https://github.com/akaariai/django/compare/ticket_16759.

Some speed measurements using django-bench: nearly all of the query_* tests are around 20% faster (all except the ones testing large bulk operation speeds). The qs_filter_chaining is 3x faster. query_complex_filter is 1.5x faster.

A test case used previously in this ticket:

10000x
SomeModel.objects.filter(pk=F('pk')+1).all()

5x faster.

Django's tests suite on sqlite: 10% faster (on an extensive one-iteration benchmark).

The cloning speed isn't just a testing artefact. Reports for 200ms+ queryset creation times for a single page load have been reported (can't find these just now, if references are needed, I can dig them from Trac). The Django ORM uses roughly 4x the time for generating SQL for simple "SELECT ... WHERE pk = x" query compared to execution of that query. One large part of this time is queryset cloning speed. In short, the speed problem is real.

Note that fixing the __deepcopy__ would make the F() case above faster, but most of the other speedups are not about the F() case - they are about the general slower speed of __deepcopy__ compared to special tailored .clone() methods.

Add in that the __deepcopy__ methods Django uses aren't using good coding practices (they are not doing real deepcopies) and I think we have pretty convincing evidence that we really do want this approach.

comment:34 Changed 22 months ago by niwi

  • Cc niwi@… added

comment:35 Changed 16 months ago by akaariai

  • Triage Stage changed from Accepted to Ready for checkin

I've taken another look at this ticket and I am convinced this is the right way to go. There are two reasons, in order of importance:

  1. We are abusing the __deepcopy__, that is, we are making deep copies which aren't actually equivalent to the original. (Example: sql.query.Query.__deepcopy__()).
  2. Performance. For the F() case (the original reason for this ticket) there is a major improvement (order of magnitude improvements likely in real-world usage). For the non-F() case we still have things like 50% faster model.save() (on in-mem SQLite) and 50s away from 450s django-core test suite speed.

This is somewhat scary issue, but on the other hand the benefits are really nice... So, if nobody objects I will commit this one soon. The branch at https://github.com/akaariai/django/compare/ticket_16759 contains the commit candidate.

comment:36 Changed 16 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 23ca3a01940c63942885df4709712cebf4df79ec:

Fixed #16759 -- Remove use of deepcopy in qs.clone()

The original problem was that queryset cloning was really expensive
when filtering with F() clauses. The deepcopy went too deep copying
_meta attributes of the models used. To fix this the use of
deepcopy in qs cloning was removed.

This commit results in some speed improvements across the djangobench
benchmark suite. Most query_* tests are 20-30% faster, save() is 50%
faster and finally complex filtering situations can see 2x to order
of magnitude improvments.

Thanks to Suor, Alex and lrekucki for valuable feedback.

Changed 14 months ago by Kronuz

Fixes #19964

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.