Opened 14 hours ago
Last modified 77 minutes ago
#36248 assigned Bug
Bulk deletion of model referred to by a SET_NULL key can exceed parameter limit
Reported by: | bobince | Owned by: | bobince |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | bobince, Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
To avoid hitting DBMS-specific limits on the number of parameters that can be used in a parameterised query, Django tries to do bulk operations in limited-size batches. This includes in db.models.deletion.Collector.collect, which calls get_del_batches to split a list of objects into manageable chunks for deletion. This was done in #16426.
If there are keys on other models referring to the model being deleted, and those keys have an on_delete setting that would cause an update to that field (such as SET_NULL), the Collector's field_updates list will gather per-chunk updates to each such field. However, Collector.delete then combines the QuerySets generated for each update into a single combined filter. If there are more rows in total than the parameter limit then this will fail.
It seems straightforward to stop doing that:
@@ -481,9 +481,8 @@ class Collector: updates.append(instances) else: objs.extend(instances) - if updates: - combined_updates = reduce(or_, updates) - combined_updates.update(**{field.name: value}) + for update in updates: + update.update(**{field.name: value}) if objs: model = objs[0].__class__ query = sql.UpdateQuery(model)
However:
- I'm not certain what the original intent was behind combining the updates here. Can there be multiple updates for some other reason than batch chunking, that we want to minimise queries for? This happened in #33928.
- Testing it is a bit tricky since the SQLite parameter limit is now often higher than the traditional 999, and Python's sqlite3 doesn't let us read or change it. On my Ubuntu box here it's 250000, which is a bit more than is comfortable to be doing in a test. We can count queries like DeletionTests.test_large_delete does but that's not _exactly_ checking the thing we actually care about, that the number of parameters is within bounds. (It's not possible at present to read the number of parameters used from the queries_log.)
(Diversion: actually in practice I'm personally affected more by this issue in mssql-django, but there are a bunch of confounding factors around that backend's prior wonky attempts to work around the parameter limit on its own. https://github.com/microsoft/mssql-django/issues/156 has some background.)
Change History (5)
comment:1 by , 14 hours ago
comment:2 by , 8 hours ago
Owner: | set to |
---|---|
Status: | new → assigned |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
The analysis and the patch here are on point (thanks bobince!) so I'll accept and mark with has_patch.
The only thing I'd change is the test so bulk_batch_size
is mocked to return a small number (e.g. 10) and not only test the number of executed queries but count the number of parameters for each queries (you can use self.assertNumQueries
as a context manager for that).
That would avoid having to create 200s objects unnecessarily but more importantly test the chunking logic in a backend agnostic way.
comment:3 by , 110 minutes ago
The only thing I'd change is the test so bulk_batch_size is mocked to return a small number
will do!
count the number of parameters for each queries
Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s
being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.
I guess I could scrape the SQL string for number of integers but this feels pretty fragile. Or I could add a "params_count" entry to the dicts in queries_log, but that's more intrusive than I'd like just for the benefit of one test.
comment:4 by , 77 minutes ago
Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.
Crap, I forgot that I ran into the exact same thing so many times and wished it would capture uninterpolated SQL and params intead. Never mind in this case.
comment:5 by , 77 minutes ago
Cc: | added |
---|
Done this here FWIW: https://github.com/django/django/compare/main...bobince:django:batch_delete_with_set_null?expand=1