Opened 3 months ago

Closed 2 weeks ago

Last modified 2 weeks ago

#36430 closed Cleanup/optimization (fixed)

bulk_batch_size() special-cases single field on SQLite according to outdated limit

Reported by: Jacob Walls Owned by: Jacob Walls
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: SQLITE_MAX_COMPOUND_SELECT, bulk_create
Cc: Sage Abdullah Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Jacob Walls)

On SQLite, bulk_batch_size() special-cases a field list of length 1 and applies the SQLITE_MAX_COMPOUND_SELECT limit to arrive at a value of 500.

I think this must date from before bulk inserts used VALUES syntax, which became available in SQLite in 2012, see discussion in Laravel.

When the list of fields exceeds 1, we go through the elif branch and arrive at much higher limits. I'm pretty sure this shows that the limit of 500 for fields=["pk"] is overly protective and can just be removed.

I don't have a unit test to provide, but you can play with changing the limit from 500 to 501 and see that test_large_delete still passes (no trouble selecting 501 objects).

(I found this while trying to refactor a different call site away from max_query_params in the hopes of just calling bulk_batch_size() until I saw how overly protective it was.)

I think this would be a good idea to resolve before anyone invests effort in reading the dynamic limit for this potentially irrelevant param.

Change History (12)

comment:1 by Jacob Walls, 3 months ago

Description: modified (diff)

Here's a rough test after all to demo, just adjust test_max_batch_size like this:

  • tests/bulk_create/tests.py

    diff --git a/tests/bulk_create/tests.py b/tests/bulk_create/tests.py
    index d590a292de..fd30ca48f7 100644
    a b class BulkCreateTests(TestCase):  
    296296    @skipUnlessDBFeature("has_bulk_insert")
    297297    def test_max_batch_size(self):
    298298        objs = [Country(name=f"Country {i}") for i in range(1000)]
    299         fields = ["name", "iso_two_letter", "description"]
     299        fields = ["pk"]
    300300        max_batch_size = connection.ops.bulk_batch_size(fields, objs)
    301301        with self.assertNumQueries(ceil(len(objs) / max_batch_size)):
    302302            Country.objects.bulk_create(objs)

The test fails as it was *more* efficient than expected:

======================================================================
FAIL: test_max_batch_size (bulk_create.tests.BulkCreateTests.test_max_batch_size)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/.../django/tests/bulk_create/tests.py", line 301, in test_max_batch_size
    with self.assertNumQueries(ceil(len(objs) / max_batch_size)):
         ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 1 != 2 : 1 queries executed, 2 expected
Captured queries were:
1. INSERT INTO "bulk_create_country" ("name", "iso_two_letter", "description") VALUES ('Country 0', '', ''), ('Country 1', '', ''), ('Country 2', '', ''), ('Country 3', '', ...

----------------------------------------------------------------------
Ran 2 tests in 0.025s

FAILED (failures=1)

comment:2 by Jacob Walls, 3 months ago

Description: modified (diff)

comment:3 by Sarah Boyce, 3 months ago

Cc: Sage Abdullah added
Triage Stage: UnreviewedAccepted

comment:4 by Jacob Walls, 3 months ago

Owner: set to Jacob Walls
Status: newassigned

I can take a swing at this given that we'll want to handle this mentioned cleanup at the same time.

comment:5 by Sage Abdullah, 3 months ago

Thanks Jacob for reporting this and thanks Sarah for cc-ing me to the ticket.

As mentioned, I did identify this in my previous PR. However, I wasn't sure why Django even uses SQLITE_MAX_COMPOUND_SELECT for a bulk query. I tried to look into where it might've come from but I didn't find any good leads.

I think this must date from before bulk inserts used VALUES syntax, which became available in SQLite in 2012, see discussion in Laravel.

Aaand, I think you're correct! I didn't know VALUES wasn't supported back then. My findings:

If it's not needed anymore, I'm in favour of removing this. I'm not 100% sure, but it seems bulk_batch_size is only used for bulk_create, bulk_update, and deletion, so I don't think any of them needs to consider the "max items in a SELECT statement" limit on SQLite.

comment:6 by Jacob Walls, 3 months ago

Has patch: set

comment:7 by Sarah Boyce, 4 weeks ago

Triage Stage: AcceptedReady for checkin

comment:8 by Jacob Walls, 4 weeks ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:9 by Jacob Walls, 3 weeks ago

Patch needs improvement: unset

comment:10 by Sarah Boyce, 2 weeks ago

Triage Stage: AcceptedReady for checkin

comment:11 by Sarah Boyce <42296566+sarahboyce@…>, 2 weeks ago

Resolution: fixed
Status: assignedclosed

In a2ce4900:

Fixed #36430 -- Removed artificially low limit on single field bulk operations on SQLite.

comment:12 by Sarah Boyce <42296566+sarahboyce@…>, 2 weeks ago

In d3cf24e9:

Refs #36430, #36416, #34378 -- Simplified batch size calculation in QuerySet.in_bulk().

Note: See TracTickets for help on using tickets.
Back to Top