Opened 3 weeks ago

Closed 3 days ago

#36815 closed Cleanup/optimization (fixed)

Avoid unnecessary prepare_value calls when inserting db_defaults

Reported by: Adam Sołtysik Owned by: YashRaj1506
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Nilesh Pahari Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In SQLInsertCompiler.as_sql, it looks like prepare_value is unnecessarily called for each DatabaseDefault value on every inserted object. These calls are quite expensive (as they enter the if hasattr(value, "resolve_expression") branch) and waste a lot of time when using bulk_create.

I've written a potential patch and a simple benchmark: https://github.com/adamsol/django/commit/d6f275c8960988c92a9da5eb323b6bd6fc7717cb. The benchmark runs 2x faster with the fix, even though there's only one field with db_default in that model. The difference may become larger with more fields.

Change History (16)

comment:1 by Rish, 3 weeks ago

Can you please share the benchmark results?

comment:2 by Adam Sołtysik, 3 weeks ago

python runtests.py bulk_create -k perf

Lowest times from several runs:

  • 0.117 with the fix
  • 0.225 after reverting the fix

comment:3 by Rish, 3 weeks ago

Moreover it seems your potential fix is breaking this testcase, I will like to look more into this.

Edit: IDK why had happened, I wasn't able to reproduce the same failure upon re-runs of the same changes. May possibly be just because of my machine being slow :(

FAIL: test_crafted_xml_performance (serializers.test_deserialization.TestDeserializer.test_crafted_xml_performance) [constant depth, varying length]
The time to process invalid inputs is not quadratic.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib64/python3.14/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/usr/lib64/python3.14/unittest/case.py", line 565, in subTest
    yield
  File "/home/rishu/Developer/OSS/django/tests/serializers/test_deserialization.py", line 182, in assertFactor
    self.assertLessEqual(sum(factors) / len(factors), factor)
    ^^^^^^^
  File "/usr/lib64/python3.14/unittest/case.py", line 1303, in assertLessEqual
    self.fail(self._formatMessage(msg, standardMsg))
    ^^^^^^^^^^^
  File "/usr/lib64/python3.14/unittest/case.py", line 750, in fail
    raise self.failureException(msg)
    ^^^^^^^^^^^^^^^
AssertionError: 2.2543277454075485 not less than or equal to 2

----------------------------------------------------------------------
Ran 18875 tests in 103.044s

FAILED (failures=1, skipped=1458, expected failures=5)
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database

Otherwise it works fine:

...s..................s....................................................................................
----------------------------------------------------------------------
Ran 18875 tests in 106.334s

OK (skipped=1458, expected failures=5)
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Last edited 3 weeks ago by Rish (previous) (diff)

comment:4 by Adam Sołtysik, 3 weeks ago

That test can apparently sometimes fail randomly, see the comment:

# Assert based on the average factor to reduce test flakiness.
self.assertLessEqual(sum(factors) / len(factors), factor)

comment:5 by Rish, 3 weeks ago

Thanks. Also, can you please help me understand why this fix would work here specifically? I thought about patching this in the prepare_value method, somewhat like:

django/db/models/sql/compiler.py:1722

    def prepare_value(self, field, value):
        from django.db.models.expressions import DatabaseDefault
        """
        Prepare a value to be used in a query by resolving it if it is an
        expression and otherwise calling the field's get_db_prep_save().
        """


        if hasattr(value, "resolve_expression"):
            
            if not isinstance(value, DatabaseDefault):
                return value
            
            value = value.resolve_expression(
                self.query, allow_joins=False, for_save=True
            )

And an absurd number of testcases failed due to that. Maybe this might be a problem with my approach of generalization or this optimization is really specific to the case here. Either way, adding that context would he helpful.

Last edited 3 weeks ago by Rish (previous) (diff)

comment:6 by Adam Sołtysik, 3 weeks ago

First of all, the condition has the wrong logic in this variant, it should be without negation. But apart from that, the field_prepare function (and so prepare_value) is called on DatabaseDefault values also later in the code, for databases that don't satisfy supports_default_keyword_in_bulk_insert (Oracle), so this approach probably wouldn't work there.

comment:7 by Nilesh Pahari, 3 weeks ago

Cc: Nilesh Pahari added

comment:8 by Jacob Walls, 3 weeks ago

Triage Stage: UnreviewedAccepted
Version: 6.0dev

Thanks for the benchmark, this seems reasonable. Would you like to submit a PR? We can discuss there if we should at least call field_prepare() once instead of zero times per db-default-field, but certainly not N times.

(Btw, regarding the flaky test, it's a known issue and is being removed in an unrelated PR.)

comment:9 by YashRaj1506, 3 weeks ago

Owner: set to YashRaj1506
Status: newassigned

in reply to:  2 comment:10 by YashRaj1506, 9 days ago

Okay so i went through this, and i actually didnt notice a pr was attatched but my solution is also exactly what Adam came up with. " Ignoring the field_prepare(field_pre_save(obj)) from happening if it is a DatabaseDefaultobject and just passing the value we get from field_pre_save(obj) so that we dont break any operation which is happening after this point.

I borrowed the test from adams pr and everything passed nicely, also improving the benchmark.

without new code

(djangodev) yash@pop-os:~/Desktop/36815/django/tests$ python runtests.py bulk_create 
Testing against Django installed in '/home/yash/Desktop/36815/django/django' with up to 16 processes
Found 60 test(s).
Creating test database for alias 'default'...
Cloning test database for alias 'default'...
Cloning test database for alias 'default'...
System check identified no issues (0 silenced).
...0.217
.............s.........................s...s...s......sss
----------------------------------------------------------------------
Ran 60 tests in 0.368s

OK (skipped=7)
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...

with imrpovements:

(djangodev) yash@pop-os:~/Desktop/36815/django/tests$ python runtests.py bulk_create 
Testing against Django installed in '/home/yash/Desktop/36815/django/django' with up to 16 processes
Found 60 test(s).
Creating test database for alias 'default'...
Cloning test database for alias 'default'...
Cloning test database for alias 'default'...
System check identified no issues (0 silenced).
...0.117
.............s.........................s...s...s......sss
----------------------------------------------------------------------
Ran 60 tests in 0.275s

OK (skipped=7)
Destroying test database for alias 'default'...
Destroying test database for alias 'default'...
Destroying test database for alias 'default'..
Version 0, edited 9 days ago by YashRaj1506 (next)

in reply to:  8 comment:11 by YashRaj1506, 9 days ago

Replying to Jacob Walls:

Thanks for the benchmark, this seems reasonable. Would you like to submit a PR? We can discuss there if we should at least call field_prepare() once instead of zero times per db-default-field, but certainly not N times.

(Btw, regarding the flaky test, it's a known issue and is being removed in an unrelated PR.)

Also i want to ask, why are we tyring to run field prepare here? i didnt understand that part. Shoudn't we ignore it if its a DatabaseDefault object?

Last edited 9 days ago by YashRaj1506 (previous) (diff)

comment:12 by Adam Sołtysik, 9 days ago

As far as I'm concerned, feel free to submit a PR. I didn't do it myself mainly because with things like CLA it's a little bit too much bureaucracy for me. Also, my benchmark was obviously not designed to end up in the actual test suite. I guess that a proper regression test should be written, although I'm not sure what's the policy for testing against executing unnecessary code.

comment:13 by YashRaj1506, 8 days ago

Yeah sure its more of a benchmark, i will write the proper regression test here and make the pr.

comment:14 by YashRaj1506, 8 days ago

Has patch: set

comment:15 by Jacob Walls, 3 days ago

Triage Stage: AcceptedReady for checkin

comment:16 by Jacob Walls <jacobtylerwalls@…>, 3 days ago

Resolution: fixed
Status: assignedclosed

In 9247410b:

Fixed #36815 -- Optimized insertion of db_default fields in bulk_create().

Thanks Adam Sołtysik for the implementation idea.

Note: See TracTickets for help on using tickets.
Back to Top