#26063 closed Bug (fixed)
Regression in Django 1.9: SQLite can no longer handle more than 2000 values in a "foo__in" filter
Description ¶
Commit 4f6a7663bcddffb114f2647f9928cbf1fdd8e4b5 introduced a regression in Django 1.9 because you can no longer have a queryset of the form Model.objects.filter(foo__in=array)
with more than 2000 items in array
.
The above changes generates a “SELECT QUOTE(?), ..., QUOTE(?)” query with as many values as items in the array. This will hit the SQLite limit SQLITE_MAX_COLUMN which defaults to 2000 and can only be increased up to 32767.
Before this change we were only limited by SQLITE_MAX_VARIABLE_NUMBER which is lower by default (999) but which can be increased to a much higher value. In Debian for example we use a value of 250000.
While the latter limit was unreasonably low (it just costs a bit of memory and you can expect to have many parameters), the former limit make much more sense IMO and I don't expect distributions to override the upstream value.
I'm not sure what's the best way forward. Possibly process parameters by batch of 2000.
(This bug has been initially reported in https://bugs.debian.org/809211 )
Change History (33)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 9 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Yes, we should batch the escaping function.
follow-up: 7 comment:4 by , 9 years ago
I'm stuck writing tests for this.
The problem only occurs when SQLITE_MAX_VARIABLE_NUMBER has been changed to be greated than SQLITE_MAX_COLUMN, which isn't the default, and there's no way to introspect these values -- the Python bindings don't expose the get_limit
API of sqlite.
comment:5 by , 9 years ago
I ended up exercising directly a private API, which isn't great, but should suffice to prevent regressions.
comment:6 by , 9 years ago
Has patch: | set |
---|
comment:7 by , 9 years ago
Replying to aaugustin:
I'm stuck writing tests for this.
The problem only occurs when SQLITE_MAX_VARIABLE_NUMBER has been changed to be greated than SQLITE_MAX_COLUMN, which isn't the default, and there's no way to introspect these values -- the Python bindings don't expose the
get_limit
API of sqlite.
It is enough, I think, to try to generate a query with 2001 variables and see if it works. That may be less accurate for the reasons, but it checks the symptoms, and doesn't need private APIs. cursor.execute('select %s' + sum(2000*[' + %s']), 2001*[1])
or something like that.
comment:8 by , 9 years ago
Cc: | added |
---|
comment:9 by , 9 years ago
Aymeric Augustin wrote:
If the reporter could test it and confirm that it fixes the issue, that would be great.
I get an error futher:
File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 258, in __iter__ self._fetch_all() File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 1074, in _fetch_all self._result_cache = list(self.iterator()) File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 52, in __iter__ results = compiler.execute_sql() File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 852, in execute_sql cursor.execute(sql, params) File "/usr/lib/python2.7/dist-packages/django/db/backends/utils.py", line 83, in execute sql = self.db.ops.last_executed_query(self.cursor, sql, params) File "/usr/lib/python2.7/dist-packages/django/db/backends/sqlite3/operations.py", line 141, in last_executed_query return sql % params TypeError: not enough arguments for format string
Here params is a list of 25000 integers.
Thanks,
Christophe
follow-up: 13 comment:10 by , 9 years ago
Shai: the problem in that case is that there's no way to tell apart these two cases:
cursor.execute()
raisedOperationalError
becauseSQLITE_MAX_VARIABLE_NUMBER = 999
(the default value) and the query contains more than 999 parameterscursor.execute()
raisedOperationalError
while attempting to generate the representation of the last query because of the bug reported in this ticket
follow-up: 19 comment:11 by , 9 years ago
Well we could look at the exception message. The former gives "too many SQL variables"; I assume the latter gives something else.
Unfortunately I don't have access to a Debian system affected by this non-standard configuration right now, so I cannot reproduce the issue easily.
follow-up: 21 comment:12 by , 9 years ago
Patch needs improvement: | set |
---|
Marking the patch as needing improvement based on tobald's comment.
Unfortunately I cannot determine what happens based on code inspection.
comment:13 by , 9 years ago
Replying to aaugustin:
Shai: the problem in that case is that there's no way to tell apart these two cases:
cursor.execute()
raisedOperationalError
becauseSQLITE_MAX_VARIABLE_NUMBER = 999
(the default value) and the query contains more than 999 parameterscursor.execute()
raisedOperationalError
while attempting to generate the representation of the last query because of the bug reported in this ticket
...but the statement I gave only has one column. It will never fail over the bug, only on number of variables.
comment:14 by , 9 years ago
Let's assume a statement with C columns and P parameters. With the current implementation, on SQLite, when DEBUG = True
:
(1) The statement itself is executed: C columns, P parameters. This requires C ≤ SQLITE_MAX_COLUMN
and P ≤ SQLITE_MAX_VARIABLE_NUMBER
.
(2) Another statement is executed to escape the parameters: P columns, P parameters. This requires P ≤ MIN(SQLITE_MAX_COLUMN, SQLITE_MAX_VARIABLE_NUMBER)
The default values of SQLite are SQLITE_MAX_COLUMN = 2000 > SQLITE_MAX_VARIABLE_NUMBER = 999
. With these values, MIN(SQLITE_MAX_COLUMN, SQLITE_MAX_VARIABLE_NUMBER) = SQLITE_MAX_VARIABLE_NUMBER
. As a consequence, any statement that will go through (1) without raising an exception will also go through (2) without raising an exception.
In your example, C = 1 and P = 2000. This will fail at step (1) with SQLite's default values. It will reproduce the original reporter's problem, but only on platforms where SQLITE_MAX_VARIABLE_NUMBER
has been increased to over 2000.
We need the test to pass or be skipped on all platforms, though.
I see two ways to achieve this:
- detecting the values of
SQLITE_MAX_COLUMN
andSQLITE_MAX_VARIABLE_NUMBER
and ajusting accordingly — which doesn't seem possible in Python - writing a test just for (2), without going through (1) — which I did.
comment:15 by , 9 years ago
My statement, as far as I understand, does not reproduce the OP's problem, because their problem is caused by C > 2000 and my statement has C = 1. It can be used to verify if statements with P>2000 are accepted (it has P=2001); so,
- if the statement failed declare the test skipped or an expected failure.
- If it succeeded, go on to test with the OP's ORM query
You can skip or xfail a test from within its code by raising the appropriate exception, AFAIK.
follow-up: 18 comment:16 by , 9 years ago
The OP mentions Model.objects.filter(foo__in=array)
and says params is a list of 25000 integers.
I understand that as C = <number of fields in Model> and P = 25000.
comment:17 by , 9 years ago
You are both correct AFAIK.
But if you do what shaib suggest, then the unit test is only useful on systems where SQLITE_MAX_VARIABLE_NUMBER has been increased. I would suggest that maybe two tests are warranted, one to verify that the the quoting function does it by batches (whatever configuration is in place) and one to test we can have many values in a real query run through the ORM if sqlite does accept many parameters.
comment:18 by , 9 years ago
Replying to aaugustin:
The OP mentions
Model.objects.filter(foo__in=array)
and saysparams is a list of 25000 integers
.
I understand that as C = <number of fields in Model> and P = 25000.
Right. But the quoting query, with the current code, turns that into C=P=25000.
comment:19 by , 9 years ago
Replying to aaugustin:
Well we could look at the exception message. The former gives "too many SQL variables"; I assume the latter gives something else.
Unfortunately I don't have access to a Debian system affected by this non-standard configuration right now, so I cannot reproduce the issue easily.
The Debian bug report linked has the stacktrace, it ends with this:
File "/usr/lib/python2.7/dist-packages/django/db/backends/sqlite3/operations.py", line 128, in last_executed_query params = self._quote_params_for_last_executed_query(params) File "/usr/lib/python2.7/dist-packages/django/db/backends/sqlite3/operations.py", line 117, in _quote_params_for_last_executed_query return cursor.execute(sql, params).fetchone() OperationalError: too many columns in result set
follow-up: 27 comment:20 by , 9 years ago
Shai: exactly — that's why we can't tell an OperationalError
when running the query because P > SQLITE_MAX_VARIABLE_NUMBER from an OperationalError
while quoting the params because P > SQLITE_MAX_COLUMN, which in turn is why I ended up testing the private API that does just the quoting.
comment:21 by , 9 years ago
Replying to aaugustin:
Marking the patch as needing improvement based on tobald's comment.
Unfortunately I cannot determine what happens based on code inspection.
I can reproduce the problem too. I don't know what happens but when I try to run a query like described here (with 2000 items in the array), and when I add some print statement, I see in last_executed_query that the "sql" variable only contains 1024 "%s" whereas the "params" variable correctly contains the 2000 quoted parameters.
comment:22 by , 9 years ago
Thanks Raphael. There's a very stupid mistake in my code :-( I'll fix it tonight.
comment:23 by , 9 years ago
Found it:
diff --git a/django/db/backends/sqlite3/operations.py b/django/db/backends/sqlite3/operations.py index 0d6f084..f56f72e 100644 --- a/django/db/backends/sqlite3/operations.py +++ b/django/db/backends/sqlite3/operations.py @@ -114,7 +114,7 @@ class DatabaseOperations(BaseDatabaseOperations): # limits are in effect and split the work in batches if needed. BATCH_SIZE = 999 if len(params) > BATCH_SIZE: - results = [] + results = tuple() for index in range(0, len(params), BATCH_SIZE): chunk = params[index:index + BATCH_SIZE] results += self._quote_params_for_last_executed_query(chunk)
But I still don't understand why I see only 1024 "%s"... and why it doesn't fail even with the above fix.
comment:24 by , 9 years ago
Forget the point about having only 1024 %s. It looks like my command line to count them did not work as I expected...
tr -dc %|wc -c
and pasting, press enter, type CTRL D gives back 1024. But the same with tr -dc % <<END | wc -c
gives the correct value... possibly some line buffering issue from the shell. Bah.
comment:25 by , 9 years ago
Is there a limit on the length of a query's SQL text in SQLite or its Python driver?
comment:27 by , 9 years ago
Replying to aaugustin:
Shai: exactly — that's why we can't tell an
OperationalError
when running the query because P > SQLITE_MAX_VARIABLE_NUMBER from anOperationalError
while quoting the params because P > SQLITE_MAX_COLUMN, which in turn is why I ended up testing the private API that does just the quoting.
We can if we run the "big P, small C" query on a raw cursor rather than a Django cursor.
comment:28 by , 9 years ago
Hi, it looks like you haven't updated your pull request yet. What's the status?
comment:29 by , 9 years ago
The status is "still intending to work on it when I find time"... I'd like to fix it in 1.9.2, which should be released in February.
(Also my theory of the "very stupid mistake" turned out to be wrong, so I still don't know why my patch doesn't work.)
If you're interested in improving the patch, just assign this ticket to yourself so we don't duplicate effort. Thanks!
comment:30 by , 9 years ago
Patch needs improvement: | unset |
---|
I amended my pull request but I don't really have time to install a Debian to test it.
If you don't mind testing that version, that would be really helpful.
comment:31 by , 9 years ago
I tested the updated patch on Debian and it works fine for me now. Thanks for your work!
I forgot to put the link documenting the above limits: https://sqlite.org/limits.html