Opened 2 weeks ago
Last modified 8 days ago
#36143 assigned Cleanup/optimization
DatabaseOperations.bulk_batch_size() is overly protective on SQLite in most cases
Reported by: | Sarah Boyce | Owned by: | Sage Abdullah |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
DatabaseOperations.bulk_batch_size()
for SQLite is dependent on DatabaseFeatures.max_query_params
which is currently set as 999.
However, for SQLite versions >= 3.32, this limit is increased to 32,766. See: https://www.sqlite.org/limits.html#max_variable_number
SQLITE_MAX_VARIABLE_NUMBER, which defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.
This means that is some cases (assuming there isn't another database limit being hit), we can increase the maximum batch size for SQLite.
For illustration purposes, on SQLite >= 3.32, if the value of DatabaseFeatures.max_query_params
is increased, the following test would pass:
-
tests/bulk_create/tests.py
diff --git a/tests/bulk_create/tests.py b/tests/bulk_create/tests.py index 7b86a2def5..ce9123ce97 100644
a b class BulkCreateTests(TestCase): 286 286 objs = [Country(name=f"Country {i}") for i in range(1000)] 287 287 fields = ["name", "iso_two_letter", "description"] 288 288 max_batch_size = max(connection.ops.bulk_batch_size(fields, objs), 1) 289 with self.assertNumQueries( ceil(len(objs) / max_batch_size)):289 with self.assertNumQueries(1): 290 290 Country.objects.bulk_create(objs, batch_size=max_batch_size + 1) 291 291 292 292 @skipUnlessDBFeature("has_bulk_insert")
PR discussion references:
- https://github.com/django/django/pull/19088#discussion_r1925652660
- https://github.com/django/django/pull/19088#discussion_r1929940327
Ticket which sparked the discussion/discovery: #36118
Change History (5)
comment:1 by , 2 weeks ago
Description: | modified (diff) |
---|
comment:2 by , 2 weeks ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 2 weeks ago
I think we should use class level cached properties for that to avoid per-connection overhead (as Python can only be built against one version of SQLite at a time) and that we should expect query counts tests that don't take into account bulk_batch_size
return value to require to be adjusted.
comment:4 by , 8 days ago
Owner: | set to |
---|---|
Status: | new → assigned |
Note that while there is a compile-time option, the limit can be made lower for each connection according to the SQLite docs:
The maximum host parameter number can be lowered at run-time using the
sqlite3_limit(db,SQLITE_LIMIT_VARIABLE_NUMBER,size)
interface.
I just found that Python >= 3.11 introduced the functions getlimit() and setlimit() that can be used to get/set these per-connection limits at run-time. This means we'd probably want it to vary on each connection, rather than cached on the class. That said, if it's cached anyway, developers who want to set a different limit on the connection will have to clear the cache in order for Django to pick it up.
Django 6.0 will require Python 3.12+, so I think we're good to use these functions. I haven't checked whether this is specific to CPython, though. There's also an example usage in the CPython tests.
I'm assigning this to myself so I can hand it over for Djangonaut Space in the coming weeks, I hope it's okay. That said, please do leave a comment if you have any opinions on how we should approach this ticket. Thanks!
comment:5 by , 8 days ago
This means we'd probably want it to vary on each connection, rather than cached on the class
That makes sense. Targeting Django 6.0 also seems like a good choice given it simplifies Python dependency handling and will also allow us to target SQLite 3.32+ as well.
Happy to see you and Djangonaut Space tackle this one Sage. Certainly a lot of good learning along the way and a nice way to validate your SQLite version Docker changes.
Sage showed a query to get the most accurate limit based on compilation flags. Some discussion needed about whether and how to do that (e.g.
@cached_property
? What about people asserting a query count in tests?)