Opened 3 years ago

Last modified 2 years ago

#21160 new Bug

in_bulk() fails on SQLite when passing more than 999 ids

Reported by: Andrei Picus Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: in_bulk sqlite 999 1000
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Daniele Procida)

As we all know, SQLite doesn't allow more than 999 variables in a query, failing with the 'too many SQL variables' error if you pass in more. This is a problem when using the in_bulk() method.

I've looked through the code and found out that all backends define a bulk_batch_size ops feature and it is set to 999 in the SQLite backend.
https://github.com/django/django/blob/master/django/db/backends/sqlite3/base.py#L140

This ops feature is only used in _batched_insert. I've written a patch that makes use of it in the in_bulk method as well. You can check it here: https://github.com/django/django/pull/1679

I've also written two tests that test passing in a large number of ids and also the efficiency in this case. The efficiency test runs only on databases that don't support 1000 variables in a query.

Any comments are welcome.

Change History (6)

comment:1 Changed 3 years ago by Andrei Picus

Pull request: https://github.com/django/django/pull/1679

For the love of me I can't find the way to edit the original description.

comment:2 Changed 3 years ago by Tim Graham

Severity: Release blockerNormal

comment:3 Changed 3 years ago by Daniele Procida

Description: modified (diff)

comment:4 Changed 3 years ago by Aymeric Augustin

Triage Stage: UnreviewedAccepted

At first sight, the patch looks reasonable.

comment:5 Changed 3 years ago by Andrei Picus

Any other thoughts on this?

comment:6 Changed 2 years ago by Anssi Kääriäinen

Patch needs improvement: set

Updated patch can be found from https://github.com/django/django/pull/3046.

The usage of bulk_batch_size() is incorrect in the patch. The method is documented to receive a list of objects, but for the use case in this ticket it receives an iterable of field values. We can either change the method definition so that it can receive either an iterable of field values, or a list of objects. Or we can just add a new feature "max_query_params" and use that instead.

I slightly prefer the approach of adding max_query_params in to database features. Then we can make supports_1000_query_params to be a property which is calculated based on max_query_params, and also make the base bulk_batch_size() to use the max_query_params feature.

Note: See TracTickets for help on using tickets.
Back to Top