Opened 7 years ago

Last modified 2 years ago

#27624 assigned Cleanup/optimization

Optimize ORM by using more immutable data structures

Reported by: Adam Johnson Owned by: Adam Johnson
Component: Database layer (models, ORM) Version: 1.10
Severity: Normal Keywords:
Cc: me@…, Keryn Knight Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django's QuerySet and Query classes spend most of their lifetime being copied, and each step in the building of a query, a very limited selection of the attributes get changed. As such they incur time and memory overheads from using mutable datastructures that require a copy operation for each step. Mutable data structures also introduce the possibility for bugs if they get shared or mutated in the wrong places. Finally QuerySet and Query are inconsistent and mix mutable and immutable datastructures for similar attributes.

I'm going to look at converting them to using just immutable ones (tuple instead of set, frozenset instead of set, etc.).

Change History (15)

comment:1 by Tim Graham, 7 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Adam Johnson, 7 years ago

Tim, I will be looking at changing some more of the attributes after the current two PR's are merged, please keep the ticket open for a bit ;)

comment:3 by Tim Graham <timograham@…>, 7 years ago

In 6ebf8f90:

Refs #27624 -- Made QuerySet._prefetch_related_lookups immutable.

comment:4 by Pamela McANulty, 7 years ago

This was a PR comment I made that I'm duplicating here:

Given that performance optimizations often unexpectedly fail to improve performance, and especially given the unintuitive timings on tuple([comprehension]) v tuple(generator comprehension), I think this PR _must_ have some before and after overall performance comparison before being accepted and merged.

comment:5 by Adam Johnson, 7 years ago

Note: if all of Query's attributes were immutable, we could probably greatly speed up its instantiation and copying by not having every attribute defined on every instance, and relying on fallback to class level defaults, then clone() doing a simple new.__dict__.update(self.__dict__) to copy all the non-default attributes. For example, take these two classes:

In [24]: class Foo(object):
    ...:     def __init__(self):
    ...:         self.x = None
    ...:         self.y = None
    ...:         self.z = None
    ...:     def clone(self):
    ...:         obj = self.__class__()
    ...:         obj.x = self.x
    ...:         obj.y = self.y
    ...:         obj.z = self.z
    ...:         return obj
    ...:

In [25]: class Foo2(object):
    ...:     x = None
    ...:     y = None
    ...:     z = None
    ...:     def __init__(self):
    ...:         pass
    ...:     def clone(self):
    ...:         obj = self.__class__()
    ...:         obj.__dict__.update(self.__dict__)
    ...:         return obj
    ...:

The first example corresponds to the current situation with Query, and the bottom is the imagined future. Timing the example __init__ + clone() methods gives:

In [26]: %timeit -n 1000000 -r 3 Foo().clone()
1000000 loops, best of 3: 1.82 µs per loop

In [27]: %timeit -n 1000000 -r 3 Foo2().clone()
1000000 loops, best of 3: 1.3 µs per loop

This is about a 30% speed up. Inspecting just the clone() methods because we care a bit more about the time that takes over __init__, gives us:

In [31]: foo = Foo()

In [32]: %timeit -n 1000000 -r 3 foo.clone()
1000000 loops, best of 3: 1.15 µs per loop

In [33]: foo2 = Foo2()

In [34]: %timeit -n 1000000 -r 3 foo2.clone()
1000000 loops, best of 3: 874 ns per loop

About a 25% speed boost, very similar.

The difference will hopefully be even greater on Query as currently its clone() method is 69 lines long with many attributes getting copied.

comment:6 by Josh Smeaton, 7 years ago

https://github.com/django/djangobench can be used for performance testing, though it may need some extra modules or some cleanup before it becomes too usable for this patch.

comment:7 by Adam Johnson, 7 years ago

On the PR I have already run some basic djangobench tests

comment:8 by Tim Graham <timograham@…>, 7 years ago

In af121b0:

Refs #27624 -- Made many attributes of Query immutable.

comment:9 by Mariusz Felisiak, 3 years ago

Adam, I saw your comment about more work related with this ticket. However, I don't see Query attributes to change. It seems fair to close this issue as "fixed" after 5 years of inactivity.

comment:10 by Adam Johnson, 3 years ago

It would be fair to close this. But you did make me take a look and I noticed there are three explain attributes, including a dict, when there could be one, plus some unnecessary branches: https://github.com/django/django/pull/14857

Some of the lesser used dict attributes could also be migrated to be optional to avoid their copy() calls. But it may not be worth the effort to change and profile.

comment:11 by Keryn Knight, 3 years ago

Cc: Keryn Knight added

Can I ask that we consider not closing it? I've only just been made aware of this ticket (... Trac search being what it is), and it turns out it discusses hoisting immutables to class level attributes for an improvement, which is a branch I have recently been working on, with the intent to fully bench (well, y'know, at least timeit/cprofile) each part and subsequently submit a ticket/PR for discussion. If I can get around to rebasing and re-testing, that change could potentially still land as part of this ticket, some time after 4.0.

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In fc91ea1e:

Refs #27624 -- Changed Query.explain_info to namedtuple.

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 5353e7c:

Refs #27624 -- Optimized Query.clone() for non-combined queries.

This avoids constructing a generator expression and a new tuple if the
Query has no combined queries.

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 6d5709ce:

Refs #27624 -- Optimized sql.Query creation by moving immutable/singleton attributes to class attributes.

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 2 years ago

In 24cc51f8:

Refs #27624 -- Optimized Query.clone() a bit.

Note: See TracTickets for help on using tickets.
Back to Top