Opened 7 months ago

Closed 4 months ago

Last modified 3 weeks ago

#36143 closed Cleanup/optimization (fixed)

DatabaseOperations.bulk_batch_size() is overly protective on SQLite in most cases

Reported by: Sarah Boyce Owned by: Xavier Frankline Odhiambo
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Sarah Boyce)

DatabaseOperations.bulk_batch_size() for SQLite is dependent on DatabaseFeatures.max_query_params which is currently set as 999.

However, for SQLite versions >= 3.32, this limit is increased to 32,766. See: https://www.sqlite.org/limits.html#max_variable_number

SQLITE_MAX_VARIABLE_NUMBER, which defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.

This means that is some cases (assuming there isn't another database limit being hit), we can increase the maximum batch size for SQLite.

For illustration purposes, on SQLite >= 3.32, if the value of DatabaseFeatures.max_query_params is increased, the following test would pass:

  • tests/bulk_create/tests.py

    diff --git a/tests/bulk_create/tests.py b/tests/bulk_create/tests.py
    index 7b86a2def5..ce9123ce97 100644
    a b class BulkCreateTests(TestCase):  
    286286        objs = [Country(name=f"Country {i}") for i in range(1000)]
    287287        fields = ["name", "iso_two_letter", "description"]
    288288        max_batch_size = max(connection.ops.bulk_batch_size(fields, objs), 1)
    289         with self.assertNumQueries(ceil(len(objs) / max_batch_size)):
     289        with self.assertNumQueries(1):
    290290            Country.objects.bulk_create(objs, batch_size=max_batch_size + 1)
    291291
    292292    @skipUnlessDBFeature("has_bulk_insert")

PR discussion references:

Ticket which sparked the discussion/discovery: #36118

Change History (15)

comment:1 by Sarah Boyce, 7 months ago

Description: modified (diff)

comment:2 by Jacob Walls, 7 months ago

Triage Stage: UnreviewedAccepted

Sage showed a query to get the most accurate limit based on compilation flags. Some discussion needed about whether and how to do that (e.g. @cached_property? What about people asserting a query count in tests?)

comment:3 by Simon Charette, 7 months ago

I think we should use class level cached properties for that to avoid per-connection overhead (as Python can only be built against one version of SQLite at a time) and that we should expect query counts tests that don't take into account bulk_batch_size return value to require to be adjusted.

comment:4 by Sage Abdullah, 7 months ago

Owner: set to Sage Abdullah
Status: newassigned

Note that while there is a compile-time option, the limit can be made lower for each connection according to the SQLite docs:

The maximum host parameter number can be lowered at run-time using the sqlite3_limit(db,SQLITE_LIMIT_VARIABLE_NUMBER,size) interface.

I just found that Python >= 3.11 introduced the functions getlimit() and setlimit() that can be used to get/set these per-connection limits at run-time. This means we'd probably want it to vary on each connection, rather than cached on the class. That said, if it's cached anyway, developers who want to set a different limit on the connection will have to clear the cache in order for Django to pick it up.

Django 6.0 will require Python 3.12+, so I think we're good to use these functions. I haven't checked whether this is specific to CPython, though. There's also an example usage in the CPython tests.

I'm assigning this to myself so I can hand it over for Djangonaut Space in the coming weeks, I hope it's okay. That said, please do leave a comment if you have any opinions on how we should approach this ticket. Thanks!

comment:5 by Simon Charette, 7 months ago

This means we'd probably want it to vary on each connection, rather than cached on the class

That makes sense. Targeting Django 6.0 also seems like a good choice given it simplifies Python dependency handling and will also allow us to target SQLite 3.32+ as well.

Happy to see you and Djangonaut Space tackle this one Sage. Certainly a lot of good learning along the way and a nice way to validate your SQLite version Docker changes.

comment:6 by Xavier Frankline Odhiambo, 6 months ago

Owner: changed from Sage Abdullah to Xavier Frankline Odhiambo

comment:7 by Sage Abdullah, 4 months ago

PR

Xavier and I worked on this during the Djangonaut Space program, and I added tests during the DjangoCon Europe 2025 sprints.

I also made a repo to experiment with the impact and I can confirm that this reduces the number of queries in bulk operations quite significantly. Interestingly, using the CaptureQueriesContext (either directly or via self.assertNumQueries) made the test crawl after increasing the query parameter limit, so maybe there's room for optimization in there. Without capturing the queries, the operation is fast even with a large number of parameters.

comment:8 by Sage Abdullah, 4 months ago

Has patch: set

comment:9 by Adam Johnson, 4 months ago

Patch needs improvement: set

comment:10 by Sage Abdullah, 4 months ago

Patch needs improvement: unset

comment:11 by Adam Johnson, 4 months ago

Triage Stage: AcceptedReady for checkin

comment:12 by Sarah Boyce <42296566+sarahboyce@…>, 4 months ago

In 38660a61:

Refs #36143 -- Tested bulk_batch_size limit for bulk_update and bulk_create.

comment:13 by Sarah Boyce <42296566+sarahboyce@…>, 4 months ago

Resolution: fixed
Status: assignedclosed

In 358fd21:

Fixed #36143 -- Made max_query_params respect SQLITE_LIMIT_VARIABLE_NUMBER.

Co-authored-by: Xavier Frankline <xf.xavierfrank@…>

comment:14 by nessita <124304+nessita@…>, 3 months ago

In a68e8565:

Refs #34378, #36143, #36416 -- Fixed isolation of LookupTests.test_in_bulk_preserve_ordering_with_batch_size().

max_query_params is a property, so it must be patched on the class.

comment:15 by Natalia <124304+nessita@…>, 3 weeks ago

In a4e27c0:

[5.2.x] Refs #34378, #36143, #36416 -- Fixed isolation of LookupTests.test_in_bulk_preserve_ordering_with_batch_size().

max_query_params is a property, so it must be patched on the class.

Backport of a68e8565cdd4fc3f8b738fc516095dab142b9d65 from main.

Note: See TracTickets for help on using tickets.
Back to Top