Opened 8 years ago
Last modified 3 years ago
#27624 assigned Cleanup/optimization
Optimize ORM by using more immutable data structures
Reported by: | Adam Johnson | Owned by: | Adam Johnson |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.10 |
Severity: | Normal | Keywords: | |
Cc: | me@…, Keryn Knight | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Django's QuerySet
and Query
classes spend most of their lifetime being copied, and each step in the building of a query, a very limited selection of the attributes get changed. As such they incur time and memory overheads from using mutable datastructures that require a copy operation for each step. Mutable data structures also introduce the possibility for bugs if they get shared or mutated in the wrong places. Finally QuerySet
and Query
are inconsistent and mix mutable and immutable datastructures for similar attributes.
I'm going to look at converting them to using just immutable ones (tuple instead of set, frozenset instead of set, etc.).
Change History (15)
comment:1 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 8 years ago
comment:4 by , 8 years ago
This was a PR comment I made that I'm duplicating here:
Given that performance optimizations often unexpectedly fail to improve performance, and especially given the unintuitive timings on tuple([comprehension])
v tuple(generator comprehension)
, I think this PR _must_ have some before and after overall performance comparison before being accepted and merged.
comment:5 by , 8 years ago
Note: if all of Query
's attributes were immutable, we could probably greatly speed up its instantiation and copying by not having every attribute defined on every instance, and relying on fallback to class level defaults, then clone()
doing a simple new.__dict__.update(self.__dict__)
to copy all the non-default attributes. For example, take these two classes:
In [24]: class Foo(object): ...: def __init__(self): ...: self.x = None ...: self.y = None ...: self.z = None ...: def clone(self): ...: obj = self.__class__() ...: obj.x = self.x ...: obj.y = self.y ...: obj.z = self.z ...: return obj ...: In [25]: class Foo2(object): ...: x = None ...: y = None ...: z = None ...: def __init__(self): ...: pass ...: def clone(self): ...: obj = self.__class__() ...: obj.__dict__.update(self.__dict__) ...: return obj ...:
The first example corresponds to the current situation with Query
, and the bottom is the imagined future. Timing the example __init__
+ clone()
methods gives:
In [26]: %timeit -n 1000000 -r 3 Foo().clone() 1000000 loops, best of 3: 1.82 µs per loop In [27]: %timeit -n 1000000 -r 3 Foo2().clone() 1000000 loops, best of 3: 1.3 µs per loop
This is about a 30% speed up. Inspecting just the clone()
methods because we care a bit more about the time that takes over __init__
, gives us:
In [31]: foo = Foo() In [32]: %timeit -n 1000000 -r 3 foo.clone() 1000000 loops, best of 3: 1.15 µs per loop In [33]: foo2 = Foo2() In [34]: %timeit -n 1000000 -r 3 foo2.clone() 1000000 loops, best of 3: 874 ns per loop
About a 25% speed boost, very similar.
The difference will hopefully be even greater on Query
as currently its clone()
method is 69 lines long with many attributes getting copied.
comment:6 by , 8 years ago
https://github.com/django/djangobench can be used for performance testing, though it may need some extra modules or some cleanup before it becomes too usable for this patch.
comment:9 by , 3 years ago
Adam, I saw your comment about more work related with this ticket. However, I don't see Query
attributes to change. It seems fair to close this issue as "fixed" after 5 years of inactivity.
comment:10 by , 3 years ago
It would be fair to close this. But you did make me take a look and I noticed there are three explain attributes, including a dict, when there could be one, plus some unnecessary branches: https://github.com/django/django/pull/14857
Some of the lesser used dict attributes could also be migrated to be optional to avoid their copy()
calls. But it may not be worth the effort to change and profile.
comment:11 by , 3 years ago
Cc: | added |
---|
Can I ask that we consider not closing it? I've only just been made aware of this ticket (... Trac search being what it is), and it turns out it discusses hoisting immutables to class level attributes for an improvement, which is a branch I have recently been working on, with the intent to fully bench (well, y'know, at least timeit/cprofile) each part and subsequently submit a ticket/PR for discussion. If I can get around to rebasing and re-testing, that change could potentially still land as part of this ticket, some time after 4.0.
Tim, I will be looking at changing some more of the attributes after the current two PR's are merged, please keep the ticket open for a bit ;)