Opened 9 months ago

Closed 9 months ago

#35102 closed Cleanup/optimization (fixed)

Performance regression on Model.objects.count() between Django 4.2 and 5.0

Reported by: Anthony Shaw Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I was running the djangobench tool against the 4.2 and 5.0.1 tags (and main to verify) and noticed that some of the benchmarks had significant regressions.

The one I wanted to highlight is query_count. The other two with regressions were model creation and query_annotate.

djangobench --control 4.2 --experiment 5.0.1 query_count -p profiles

The profile for both versions shows that in v5, get_aggregation has an additional call to replace_expressions which spends a lot of time in the slow inspect module. In the profile reports I collected this accounts for the 30-40% performance regression on the count aggregation query.

Performance regressions with new features are expected, but I wanted to raise this because the [ASV environment for Django](https://github.com/django/django-asv) seemed to have missed some of these regressions when all of the benchmarks suddenly ran faster.

I've attached profiles for 4.2 and 5.0.1 which can be opened using something like snakeviz

Attachments (2)

con-query_count (27.3 KB ) - added by Anthony Shaw 9 months ago.
Profile for Django 4.2
exp-query_count (28.8 KB ) - added by Anthony Shaw 9 months ago.
Profile for Django 5.1-dev

Download all attachments as: .zip

Change History (20)

by Anthony Shaw, 9 months ago

Attachment: con-query_count added

Profile for Django 4.2

by Anthony Shaw, 9 months ago

Attachment: exp-query_count added

Profile for Django 5.1-dev

comment:1 by Anthony Shaw, 9 months ago

comment:2 by Mariusz Felisiak, 9 months ago

Thanks for the report, I think we could make it faster by avoiding cloning every time:

  • django/db/models/expressions.py

    diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
    index c20de5995a..0e1c265d9e 100644
    a b class BaseExpression:  
    402402        return clone
    403403
    404404    def replace_expressions(self, replacements):
     405        if not replacements:
     406            return self
    405407        if replacement := replacements.get(self):
    406408            return replacement
     409        if not (source_expressions := self.get_source_expressions()):
     410            return self
    407411        clone = self.copy()
    408         source_expressions = clone.get_source_expressions()
    409412        clone.set_source_expressions(
    410413            [
    411414                expr.replace_expressions(replacements) if expr else None

Does it work for you?

Last edited 9 months ago by Mariusz Felisiak (previous) (diff)

comment:3 by Mariusz Felisiak, 9 months ago

Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

in reply to:  2 comment:4 by Mariusz Felisiak, 9 months ago

Replying to Mariusz Felisiak:

Does it work for you?

According to my experiments, it reduces regression to ~15%.

comment:5 by Mariusz Felisiak, 9 months ago

We could avoid cloning in two more places:

  • django/db/models/sql/query.py

    diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
    index a79d66eb21..911d0c7848 100644
    a b class Query(BaseExpression):  
    422422        return obj
    423423
    424424    def relabeled_clone(self, change_map):
     425        if not change_map:
     426            return self
    425427        clone = self.clone()
    426428        clone.change_aliases(change_map)
    427429        return clone
  • django/db/models/sql/where.py

    diff --git a/django/db/models/sql/where.py b/django/db/models/sql/where.py
    index 2f23a2932c..ae7a2a1224 100644
    a b class WhereNode(tree.Node):  
    225225        return clone
    226226
    227227    def replace_expressions(self, replacements):
     228        if not replacements:
     229            return self
    228230        if replacement := replacements.get(self):
    229231            return replacement
    230232        clone = self.create(connector=self.connector, negated=self.negated)

comment:6 by Aditi, 9 months ago

Owner: changed from nobody to Aditi
Status: newassigned

comment:7 by Mariusz Felisiak, 9 months ago

Owner: Aditi removed
Status: assignednew

Please don't assign ticket to yourself. For now, we're discussing a potential approach, and some improvements have already been proposed.

comment:8 by Simon Charette, 9 months ago

Cc: Simon Charette added

get_aggregation has an additional call to replace_expressions which spends a lot of time in the slow inspect module

One way we could cut the time spent in Expression.identity significantly would be to cache (e.g. at class initialization time) the inspect.signature(cls.__init__) signature as it represents the vast majority of the time spent by replace_expressions

https://cdn.zappy.app/dd47f5e7d4c03baff4726eb106e0d440.png

comment:9 by Anthony Shaw, 9 months ago

If Expression.hash (Expression.identity()) were to implement another mechanism for determining a unique hash that doesn't depend upon using inspect.signature the benchmark would be much faster since this is evaluated twice. I suspect this would impact other benchmarks as well, hash functions should be as fast as possible but I understand why that is challenging for something like expressions. inspect.signature is a very slow function that does a huge amount of processing to build a Signature object for the given callable.

comment:10 by Simon Charette, 9 months ago

I suspect this would impact other benchmarks as well, hash functions should be as fast as possible but I understand why that is challenging for something like expressions. inspect.signature is a very slow function that does a huge amount of processing to build a Signature object for the given callable.

yeah that's the crux of the issue here. This would require having each expression implement their own correct identity functions based on their __init__ signature which would be a significant maintenance burden and also backward incompatible for third party expressions.

I think we can get a lot of benefit by caching inspect.signature calls like the following.

  • django/db/models/expressions.py

    diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
    index c20de5995a..0703a134a9 100644
    a b def select_format(self, compiler, sql, params):  
    485485class Expression(BaseExpression, Combinable):
    486486    """An expression that can be combined with other expressions."""
    487487
     488    def __init_subclass__(cls, /, **kwargs):
     489        super().__init_subclass__(**kwargs)
     490        cls._constructor_signature = inspect.signature(cls.__init__)
     491
     492    _constructor_signature = inspect.signature(BaseExpression.__init__)
     493
    488494    @cached_property
    489495    def identity(self):
    490         constructor_signature = inspect.signature(self.__init__)
    491496        args, kwargs = self._constructor_args
    492         signature = constructor_signature.bind_partial(*args, **kwargs)
     497        signature = self._constructor_signature.bind_partial(self, *args, **kwargs)
    493498        signature.apply_defaults()
    494499        arguments = signature.arguments.items()
    495500        identity = [self.__class__]
    496501        for arg, value in arguments:
     502            if arg == 'self':
     503                continue
    497504            if isinstance(value, fields.Field):
    498505                if value.name and value.model:
    499506                    value = (value.model._meta.label, value.name)

in reply to:  10 comment:11 by Anthony Shaw, 9 months ago

Replying to Simon Charette:

I think we can get a lot of benefit by caching inspect.signature calls like the following.

That patch alone is 1.4x faster

djangobench --control main --experiment exp_1 query_count -t 200
Running benchmarks: query_count
Control: Django 5.1 (in git branch main)
Experiment: Django 5.1 (in git branch exp_1)

Running 'query_count' benchmark ...
Min: 0.000266 -> 0.000186: 1.4310x faster
Avg: 0.000292 -> 0.000202: 1.4424x faster
Significant (t=23.676856)
Stddev: 0.00004 -> 0.00003: 1.4418x smaller (N = 200)

comment:12 by Anthony Shaw, 9 months ago

And then with Mariusz' patches the overall speedup is 1.8x compared with main:

Running benchmarks: query_count
Control: Django 5.1 (in git branch main)
Experiment: Django 5.1 (in git branch exp_1)

Running 'query_count' benchmark ...
Min: 0.000294 -> 0.000163: 1.8003x faster
Avg: 0.000317 -> 0.000173: 1.8301x faster
Significant (t=53.998054)
Stddev: 0.00003 -> 0.00002: 1.4805x smaller (N = 200)

And an overall improvement on 4.2

Running benchmarks: query_count
Control: Django 4.2 (in git branch 4.2)
Experiment: Django 5.1 (in git branch exp_1)

Running 'query_count' benchmark ...
Min: 0.000191 -> 0.000174: 1.0997x faster
Avg: 0.000263 -> 0.000200: 1.3143x faster
Significant (t=3.949122)
Stddev: 0.00022 -> 0.00004: 5.2520x smaller (N = 200)
Version 0, edited 9 months ago by Anthony Shaw (next)

comment:13 by Simon Charette, 9 months ago

Alternate patch that avoids the cost at class definition time as that might have an effect on Django bootstrapping time given we define tens of Expression subclasses

  • django/db/models/expressions.py

    diff --git a/django/db/models/expressions.py b/django/db/models/expressions.py
    index c20de5995a..0ecda38353 100644
    a b  
    1414from django.db.models.constants import LOOKUP_SEP
    1515from django.db.models.query_utils import Q
    1616from django.utils.deconstruct import deconstructible
    17 from django.utils.functional import cached_property
     17from django.utils.functional import cached_property, classproperty
    1818from django.utils.hashable import make_hashable
    1919
    2020
    def select_format(self, compiler, sql, params):  
    485485class Expression(BaseExpression, Combinable):
    486486    """An expression that can be combined with other expressions."""
    487487
     488    @classproperty
     489    @functools.lru_cache(maxsize=128)
     490    def _constructor_signature(cls):
     491        return inspect.signature(cls.__init__)
     492
    488493    @cached_property
    489494    def identity(self):
    490         constructor_signature = inspect.signature(self.__init__)
    491495        args, kwargs = self._constructor_args
    492         signature = constructor_signature.bind_partial(*args, **kwargs)
     496        signature = self._constructor_signature.bind_partial(self, *args, **kwargs)
    493497        signature.apply_defaults()
    494498        arguments = signature.arguments.items()
    495499        identity = [self.__class__]
    496500        for arg, value in arguments:
     501            if arg == 'self':
     502                continue
    497503            if isinstance(value, fields.Field):
    498504                if value.name and value.model:
    499505                    value = (value.model._meta.label, value.name)
Last edited 9 months ago by Simon Charette (previous) (diff)

comment:14 by Simon Charette, 9 months ago

Has patch: set

Submitted a PR that confirms the suite is happy with both changes.

comment:15 by Mariusz Felisiak, 9 months ago

Owner: set to Simon Charette
Status: newassigned
Triage Stage: AcceptedReady for checkin

comment:16 by Mariusz Felisiak <felisiak.mariusz@…>, 9 months ago

In d074c753:

Refs #35102 -- Optimized Expression.identity used for equality and hashing.

inspect.signature() is quite slow and produces the same object for each
instance of the same class as they share their init method which
makes it a prime candidate for caching.

Thanks Anthony Shaw for the report.

comment:17 by Mariusz Felisiak <felisiak.mariusz@…>, 9 months ago

In f3d10546:

Refs #35102 -- Optimized replace_expressions()/relabelling aliases by adding early return.

This avoids costly hashing.

Thanks Anthony Shaw for the report.

Co-Authored-By: Simon Charette <charette.s@…>

comment:18 by Mariusz Felisiak, 9 months ago

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.
Back to Top