Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#26063 closed Bug (fixed)

Regression in Django 1.9: SQLite can no longer handle more than 2000 values in a "foo__in" filter

Reported by: Raphaël Hertzog Owned by: Aymeric Augustin
Component: Database layer (models, ORM) Version: 1.9
Severity: Release blocker Keywords:
Cc: d@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Commit 4f6a7663bcddffb114f2647f9928cbf1fdd8e4b5 introduced a regression in Django 1.9 because you can no longer have a queryset of the form Model.objects.filter(foo__in=array) with more than 2000 items in array.

The above changes generates a “SELECT QUOTE(?), ..., QUOTE(?)” query with as many values as items in the array. This will hit the SQLite limit SQLITE_MAX_COLUMN which defaults to 2000 and can only be increased up to 32767.

Before this change we were only limited by SQLITE_MAX_VARIABLE_NUMBER which is lower by default (999) but which can be increased to a much higher value. In Debian for example we use a value of 250000.

While the latter limit was unreasonably low (it just costs a bit of memory and you can expect to have many parameters), the former limit make much more sense IMO and I don't expect distributions to override the upstream value.

I'm not sure what's the best way forward. Possibly process parameters by batch of 2000.

(This bug has been initially reported in https://bugs.debian.org/809211 )

Change History (33)

comment:1 by Raphaël Hertzog, 8 years ago

I forgot to put the link documenting the above limits: https://sqlite.org/limits.html

comment:2 by Tim Graham, 8 years ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

comment:3 by Aymeric Augustin, 8 years ago

Owner: changed from nobody to Aymeric Augustin
Status: newassigned

Yes, we should batch the escaping function.

comment:4 by Aymeric Augustin, 8 years ago

I'm stuck writing tests for this.

The problem only occurs when SQLITE_MAX_VARIABLE_NUMBER has been changed to be greated than SQLITE_MAX_COLUMN, which isn't the default, and there's no way to introspect these values -- the Python bindings don't expose the get_limit API of sqlite.

comment:5 by Aymeric Augustin, 8 years ago

I ended up exercising directly a private API, which isn't great, but should suffice to prevent regressions.

comment:6 by Aymeric Augustin, 8 years ago

Has patch: set

in reply to:  4 comment:7 by Shai Berger, 8 years ago

Replying to aaugustin:

I'm stuck writing tests for this.

The problem only occurs when SQLITE_MAX_VARIABLE_NUMBER has been changed to be greated than SQLITE_MAX_COLUMN, which isn't the default, and there's no way to introspect these values -- the Python bindings don't expose the get_limit API of sqlite.

It is enough, I think, to try to generate a query with 2001 variables and see if it works. That may be less accurate for the reasons, but it checks the symptoms, and doesn't need private APIs. cursor.execute('select %s' + sum(2000*[' + %s']), 2001*[1]) or something like that.

comment:8 by tobald, 8 years ago

Cc: d@… added

comment:9 by tobald, 8 years ago

Aymeric Augustin wrote:

If the reporter could test it and confirm that it fixes the issue, that would be great.

I get an error futher:

      File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 258, in __iter__
        self._fetch_all()
      File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 1074, in _fetch_all
        self._result_cache = list(self.iterator())
      File "/usr/lib/python2.7/dist-packages/django/db/models/query.py", line 52, in __iter__
        results = compiler.execute_sql()
      File "/usr/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 852, in execute_sql
        cursor.execute(sql, params)
      File "/usr/lib/python2.7/dist-packages/django/db/backends/utils.py", line 83, in execute
        sql = self.db.ops.last_executed_query(self.cursor, sql, params)
      File "/usr/lib/python2.7/dist-packages/django/db/backends/sqlite3/operations.py", line 141, in last_executed_query
        return sql % params
    TypeError: not enough arguments for format string

Here params is a list of 25000 integers.

Thanks,
Christophe

comment:10 by Aymeric Augustin, 8 years ago

Shai: the problem in that case is that there's no way to tell apart these two cases:

  • cursor.execute() raised OperationalError because SQLITE_MAX_VARIABLE_NUMBER = 999 (the default value) and the query contains more than 999 parameters
  • cursor.execute() raised OperationalError while attempting to generate the representation of the last query because of the bug reported in this ticket

comment:11 by Aymeric Augustin, 8 years ago

Well we could look at the exception message. The former gives "too many SQL variables"; I assume the latter gives something else.

Unfortunately I don't have access to a Debian system affected by this non-standard configuration right now, so I cannot reproduce the issue easily.

comment:12 by Aymeric Augustin, 8 years ago

Patch needs improvement: set

Marking the patch as needing improvement based on tobald's comment.

Unfortunately I cannot determine what happens based on code inspection.

in reply to:  10 comment:13 by Shai Berger, 8 years ago

Replying to aaugustin:

Shai: the problem in that case is that there's no way to tell apart these two cases:

  • cursor.execute() raised OperationalError because SQLITE_MAX_VARIABLE_NUMBER = 999 (the default value) and the query contains more than 999 parameters
  • cursor.execute() raised OperationalError while attempting to generate the representation of the last query because of the bug reported in this ticket

...but the statement I gave only has one column. It will never fail over the bug, only on number of variables.

comment:14 by Aymeric Augustin, 8 years ago

Let's assume a statement with C columns and P parameters. With the current implementation, on SQLite, when DEBUG = True:

(1) The statement itself is executed: C columns, P parameters. This requires C ≤ SQLITE_MAX_COLUMN and P ≤ SQLITE_MAX_VARIABLE_NUMBER.

(2) Another statement is executed to escape the parameters: P columns, P parameters. This requires P ≤ MIN(SQLITE_MAX_COLUMN, SQLITE_MAX_VARIABLE_NUMBER)

The default values of SQLite are SQLITE_MAX_COLUMN = 2000 > SQLITE_MAX_VARIABLE_NUMBER = 999. With these values, MIN(SQLITE_MAX_COLUMN, SQLITE_MAX_VARIABLE_NUMBER) = SQLITE_MAX_VARIABLE_NUMBER. As a consequence, any statement that will go through (1) without raising an exception will also go through (2) without raising an exception.

In your example, C = 1 and P = 2000. This will fail at step (1) with SQLite's default values. It will reproduce the original reporter's problem, but only on platforms where SQLITE_MAX_VARIABLE_NUMBER has been increased to over 2000.

We need the test to pass or be skipped on all platforms, though.

I see two ways to achieve this:

  • detecting the values of SQLITE_MAX_COLUMN and SQLITE_MAX_VARIABLE_NUMBER and ajusting accordingly — which doesn't seem possible in Python
  • writing a test just for (2), without going through (1) — which I did.

comment:15 by Shai Berger, 8 years ago

My statement, as far as I understand, does not reproduce the OP's problem, because their problem is caused by C > 2000 and my statement has C = 1. It can be used to verify if statements with P>2000 are accepted (it has P=2001); so,

  • if the statement failed declare the test skipped or an expected failure.
  • If it succeeded, go on to test with the OP's ORM query

You can skip or xfail a test from within its code by raising the appropriate exception, AFAIK.

comment:16 by Aymeric Augustin, 8 years ago

The OP mentions Model.objects.filter(foo__in=array) and says params is a list of 25000 integers.

I understand that as C = <number of fields in Model> and P = 25000.

Last edited 8 years ago by Aymeric Augustin (previous) (diff)

comment:17 by Raphaël Hertzog, 8 years ago

You are both correct AFAIK.

But if you do what shaib suggest, then the unit test is only useful on systems where SQLITE_MAX_VARIABLE_NUMBER has been increased. I would suggest that maybe two tests are warranted, one to verify that the the quoting function does it by batches (whatever configuration is in place) and one to test we can have many values in a real query run through the ORM if sqlite does accept many parameters.

in reply to:  16 comment:18 by Shai Berger, 8 years ago

Replying to aaugustin:

The OP mentions Model.objects.filter(foo__in=array) and says params is a list of 25000 integers.

I understand that as C = <number of fields in Model> and P = 25000.

Right. But the quoting query, with the current code, turns that into C=P=25000.

in reply to:  11 comment:19 by Raphaël Hertzog, 8 years ago

Replying to aaugustin:

Well we could look at the exception message. The former gives "too many SQL variables"; I assume the latter gives something else.

Unfortunately I don't have access to a Debian system affected by this non-standard configuration right now, so I cannot reproduce the issue easily.

The Debian bug report linked has the stacktrace, it ends with this:

  File "/usr/lib/python2.7/dist-packages/django/db/backends/sqlite3/operations.py", line 128, in last_executed_query
    params = self._quote_params_for_last_executed_query(params)
  File "/usr/lib/python2.7/dist-packages/django/db/backends/sqlite3/operations.py", line 117, in _quote_params_for_last_executed_query
    return cursor.execute(sql, params).fetchone()
OperationalError: too many columns in result set

comment:20 by Aymeric Augustin, 8 years ago

Shai: exactly — that's why we can't tell an OperationalError when running the query because P > SQLITE_MAX_VARIABLE_NUMBER from an OperationalError while quoting the params because P > SQLITE_MAX_COLUMN, which in turn is why I ended up testing the private API that does just the quoting.

in reply to:  12 comment:21 by Raphaël Hertzog, 8 years ago

Replying to aaugustin:

Marking the patch as needing improvement based on tobald's comment.

Unfortunately I cannot determine what happens based on code inspection.

I can reproduce the problem too. I don't know what happens but when I try to run a query like described here (with 2000 items in the array), and when I add some print statement, I see in last_executed_query that the "sql" variable only contains 1024 "%s" whereas the "params" variable correctly contains the 2000 quoted parameters.

comment:22 by Aymeric Augustin, 8 years ago

Thanks Raphael. There's a very stupid mistake in my code :-( I'll fix it tonight.

comment:23 by Raphaël Hertzog, 8 years ago

Found it:

diff --git a/django/db/backends/sqlite3/operations.py b/django/db/backends/sqlite3/operations.py
index 0d6f084..f56f72e 100644
--- a/django/db/backends/sqlite3/operations.py
+++ b/django/db/backends/sqlite3/operations.py
@@ -114,7 +114,7 @@ class DatabaseOperations(BaseDatabaseOperations):
         # limits are in effect and split the work in batches if needed.
         BATCH_SIZE = 999
         if len(params) > BATCH_SIZE:
-            results = []
+            results = tuple()
             for index in range(0, len(params), BATCH_SIZE):
                 chunk = params[index:index + BATCH_SIZE]
                 results += self._quote_params_for_last_executed_query(chunk)

But I still don't understand why I see only 1024 "%s"... and why it doesn't fail even with the above fix.

comment:24 by Raphaël Hertzog, 8 years ago

Forget the point about having only 1024 %s. It looks like my command line to count them did not work as I expected...

tr -dc %|wc -c and pasting, press enter, type CTRL D gives back 1024. But the same with tr -dc % <<END | wc -c gives the correct value... possibly some line buffering issue from the shell. Bah.

comment:25 by Shai Berger, 8 years ago

Is there a limit on the length of a query's SQL text in SQLite or its Python driver?

comment:26 by Raphaël Hertzog, 8 years ago

Yes, SQLITE_MAX_SQL_LENGTH which defaults to 1000000.

See https://sqlite.org/limits.html

in reply to:  20 comment:27 by Shai Berger, 8 years ago

Replying to aaugustin:

Shai: exactly — that's why we can't tell an OperationalError when running the query because P > SQLITE_MAX_VARIABLE_NUMBER from an OperationalError while quoting the params because P > SQLITE_MAX_COLUMN, which in turn is why I ended up testing the private API that does just the quoting.

We can if we run the "big P, small C" query on a raw cursor rather than a Django cursor.

comment:28 by Raphaël Hertzog, 8 years ago

Hi, it looks like you haven't updated your pull request yet. What's the status?

comment:29 by Aymeric Augustin, 8 years ago

The status is "still intending to work on it when I find time"... I'd like to fix it in 1.9.2, which should be released in February.

(Also my theory of the "very stupid mistake" turned out to be wrong, so I still don't know why my patch doesn't work.)

If you're interested in improving the patch, just assign this ticket to yourself so we don't duplicate effort. Thanks!

comment:30 by Aymeric Augustin, 8 years ago

Patch needs improvement: unset

I amended my pull request but I don't really have time to install a Debian to test it.

If you don't mind testing that version, that would be really helpful.

comment:31 by Raphaël Hertzog, 8 years ago

I tested the updated patch on Debian and it works fine for me now. Thanks for your work!

comment:32 by Aymeric Augustin <aymeric.augustin@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In f91b5a7e:

Fixed #26063 -- Crash when passing > 2000 params.

If SQLITE_MAX_VARIABLE_NUMBER (default = 999) is changed at compile time
to be greater than SQLITE_MAX_COLUMN (default = 2000), which Debian does
by setting the former to 250000, Django raised an exception on queries
containing more than 2000 parameters when DEBUG = True.

comment:33 by Aymeric Augustin <aymeric.augustin@…>, 8 years ago

In cfe4ba6:

[1.9.x] Fixed #26063 -- Crash when passing > 2000 params.

If SQLITE_MAX_VARIABLE_NUMBER (default = 999) is changed at compile time
to be greater than SQLITE_MAX_COLUMN (default = 2000), which Debian does
by setting the former to 250000, Django raised an exception on queries
containing more than 2000 parameters when DEBUG = True.

Backport of f91b5a7e4b from master

Note: See TracTickets for help on using tickets.
Back to Top