Opened 4 months ago

Last modified 4 weeks ago

#36248 assigned Bug

Bulk deletion of model referred to by a SET_NULL key can exceed parameter limit

Reported by: bobince Owned by: bobince
Component: Database layer (models, ORM) Version: 5.1
Severity: Normal Keywords:
Cc: bobince, Simon Charette, Sage Abdullah Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

To avoid hitting DBMS-specific limits on the number of parameters that can be used in a parameterised query, Django tries to do bulk operations in limited-size batches. This includes in db.models.deletion.Collector.collect, which calls get_del_batches to split a list of objects into manageable chunks for deletion. This was done in #16426.

If there are keys on other models referring to the model being deleted, and those keys have an on_delete setting that would cause an update to that field (such as SET_NULL), the Collector's field_updates list will gather per-chunk updates to each such field. However, Collector.delete then combines the QuerySets generated for each update into a single combined filter. If there are more rows in total than the parameter limit then this will fail.

It seems straightforward to stop doing that:

@@ -481,9 +481,8 @@ class Collector:
                         updates.append(instances)
                     else:
                         objs.extend(instances)
-                if updates:
-                    combined_updates = reduce(or_, updates)
-                    combined_updates.update(**{field.name: value})
+                for update in updates:
+                    update.update(**{field.name: value})
                 if objs:
                     model = objs[0].__class__
                     query = sql.UpdateQuery(model)

However:

  • I'm not certain what the original intent was behind combining the updates here. Can there be multiple updates for some other reason than batch chunking, that we want to minimise queries for? This happened in #33928.
  • Testing it is a bit tricky since the SQLite parameter limit is now often higher than the traditional 999, and Python's sqlite3 doesn't let us read or change it. On my Ubuntu box here it's 250000, which is a bit more than is comfortable to be doing in a test. We can count queries like DeletionTests.test_large_delete does but that's not _exactly_ checking the thing we actually care about, that the number of parameters is within bounds. (It's not possible at present to read the number of parameters used from the queries_log.)

(Diversion: actually in practice I'm personally affected more by this issue in mssql-django, but there are a bunch of confounding factors around that backend's prior wonky attempts to work around the parameter limit on its own. https://github.com/microsoft/mssql-django/issues/156 has some background.)

Change History (8)

comment:1 by bobince, 4 months ago

We can count queries like DeletionTests.test_large_delete does but [...]

Done this here FWIW: https://github.com/django/django/compare/main...bobince:django:batch_delete_with_set_null?expand=1

comment:2 by Simon Charette, 4 months ago

Owner: set to bobince
Status: newassigned
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

The analysis and the patch here are on point (thanks bobince!) so I'll accept and mark with has_patch.

The only thing I'd change is the test so bulk_batch_size is mocked to return a small number (e.g. 10) and not only test the number of executed queries but count the number of parameters for each queries (you can use self.assertNumQueries as a context manager for that).

That would avoid having to create 200s objects unnecessarily but more importantly test the chunking logic in a backend agnostic way.

Version 0, edited 4 months ago by Simon Charette (next)

comment:3 by bobince, 4 months ago

The only thing I'd change is the test so bulk_batch_size is mocked to return a small number

will do!

count the number of parameters for each queries

Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.

I guess I could scrape the SQL string for number of integers but this feels pretty fragile. Or I could add a "params_count" entry to the dicts in queries_log, but that's more intrusive than I'd like just for the benefit of one test.

comment:4 by Simon Charette, 4 months ago

Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.

Crap, I forgot that I ran into the exact same thing so many times and wished it would capture uninterpolated SQL and params intead. Never mind in this case.

comment:5 by Simon Charette, 4 months ago

Cc: Simon Charette added

comment:6 by bobince, 4 months ago

I guess I could scrape the SQL string for number of integers but this feels pretty fragile.

Have tried something similar to this (counting commas in ‘IN’ clauses) and actually maybe it's not thaaat bad. I've written worse.

Now I have a test that's only counting parameters not queries, I tried using the existing test model ‘R’ instead of making a new one. This turned up an delightful wrinkle in that the fast-delete path can also ‘OR’ together multiple querysets, this time when there are multiple references on one model to rows being deleted. For example if Article.author and Article.editor are ForeignKeys to Person with on_delete=CASCADE, and 500 Person rows are deleted, deletion will put together a query like DELETE FROM article WHERE author IN (...500 params...) OR editor IN (...another 500 params...) and if you have a DBMS with a param limit of 999 you're now sad.

So, er, back to simpler relations for now. I don't think there's a straightforward fix to this — it would take a more wide-ranging rethink of how param limits impact batch sizes and I don't really have a good idea what that would look like. Ugh, I wish param limits would go away

comment:7 by Simon Charette, 3 months ago

Thanks for reporting your findings bobince. We are aware that our current params limiting is far from bullet proof.

The problem was recently re-surfaced when we added support for composite primary key as they require that the chunk size be divided by the number of fields included in the relationship.

Some related tickets are #36143, #36144.

Ugh, I wish param limits would go away

FYI you might be interested to know that SQLite 3.32+ seems to default to a much larger number.

comment:8 by Sage Abdullah, 4 weeks ago

Cc: Sage Abdullah added

I also noticed this issue when I was testing https://github.com/django/django-docker-box/pull/50#discussion_r1933066031 with a low enough value for SQLITE_MAX_VARIABLE_NUMBER. Some tests that do ContentType.objects.all().delete() would fail with a similar error, and I think I tried hacking a similar solution to yours but I wasn't sure if that was the right approach as I'm not familiar with the deletion code.

Note: See TracTickets for help on using tickets.
Back to Top