Opened 17 months ago

Last modified 15 months ago

#27624 assigned Cleanup/optimization

Optimize ORM by using more immutable data structures

Reported by: Adam (Chainz) Johnson Owned by: Adam (Chainz) Johnson
Component: Database layer (models, ORM) Version: 1.10
Severity: Normal Keywords:
Cc: me@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django's QuerySet and Query classes spend most of their lifetime being copied, and each step in the building of a query, a very limited selection of the attributes get changed. As such they incur time and memory overheads from using mutable datastructures that require a copy operation for each step. Mutable data structures also introduce the possibility for bugs if they get shared or mutated in the wrong places. Finally QuerySet and Query are inconsistent and mix mutable and immutable datastructures for similar attributes.

I'm going to look at converting them to using just immutable ones (tuple instead of set, frozenset instead of set, etc.).

Change History (8)

comment:1 Changed 17 months ago by Tim Graham

Triage Stage: UnreviewedAccepted

comment:2 Changed 17 months ago by Adam (Chainz) Johnson

Tim, I will be looking at changing some more of the attributes after the current two PR's are merged, please keep the ticket open for a bit ;)

comment:3 Changed 17 months ago by Tim Graham <timograham@…>

In 6ebf8f90:

Refs #27624 -- Made QuerySet._prefetch_related_lookups immutable.

comment:4 Changed 17 months ago by Pamela McANulty

This was a PR comment I made that I'm duplicating here:

Given that performance optimizations often unexpectedly fail to improve performance, and especially given the unintuitive timings on tuple([comprehension]) v tuple(generator comprehension), I think this PR _must_ have some before and after overall performance comparison before being accepted and merged.

comment:5 Changed 17 months ago by Adam (Chainz) Johnson

Note: if all of Query's attributes were immutable, we could probably greatly speed up its instantiation and copying by not having every attribute defined on every instance, and relying on fallback to class level defaults, then clone() doing a simple new.__dict__.update(self.__dict__) to copy all the non-default attributes. For example, take these two classes:

In [24]: class Foo(object):
    ...:     def __init__(self):
    ...:         self.x = None
    ...:         self.y = None
    ...:         self.z = None
    ...:     def clone(self):
    ...:         obj = self.__class__()
    ...:         obj.x = self.x
    ...:         obj.y = self.y
    ...:         obj.z = self.z
    ...:         return obj
    ...:

In [25]: class Foo2(object):
    ...:     x = None
    ...:     y = None
    ...:     z = None
    ...:     def __init__(self):
    ...:         pass
    ...:     def clone(self):
    ...:         obj = self.__class__()
    ...:         obj.__dict__.update(self.__dict__)
    ...:         return obj
    ...:

The first example corresponds to the current situation with Query, and the bottom is the imagined future. Timing the example __init__ + clone() methods gives:

In [26]: %timeit -n 1000000 -r 3 Foo().clone()
1000000 loops, best of 3: 1.82 µs per loop

In [27]: %timeit -n 1000000 -r 3 Foo2().clone()
1000000 loops, best of 3: 1.3 µs per loop

This is about a 30% speed up. Inspecting just the clone() methods because we care a bit more about the time that takes over __init__, gives us:

In [31]: foo = Foo()

In [32]: %timeit -n 1000000 -r 3 foo.clone()
1000000 loops, best of 3: 1.15 µs per loop

In [33]: foo2 = Foo2()

In [34]: %timeit -n 1000000 -r 3 foo2.clone()
1000000 loops, best of 3: 874 ns per loop

About a 25% speed boost, very similar.

The difference will hopefully be even greater on Query as currently its clone() method is 69 lines long with many attributes getting copied.

comment:6 Changed 17 months ago by Josh Smeaton

https://github.com/django/djangobench can be used for performance testing, though it may need some extra modules or some cleanup before it becomes too usable for this patch.

comment:7 Changed 17 months ago by Adam (Chainz) Johnson

On the PR I have already run some basic djangobench tests

comment:8 Changed 15 months ago by Tim Graham <timograham@…>

In af121b0:

Refs #27624 -- Made many attributes of Query immutable.

Note: See TracTickets for help on using tickets.
Back to Top