Opened 4 months ago
Last modified 4 weeks ago
#36248 assigned Bug
Bulk deletion of model referred to by a SET_NULL key can exceed parameter limit
Reported by: | bobince | Owned by: | bobince |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.1 |
Severity: | Normal | Keywords: | |
Cc: | bobince, Simon Charette, Sage Abdullah | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
To avoid hitting DBMS-specific limits on the number of parameters that can be used in a parameterised query, Django tries to do bulk operations in limited-size batches. This includes in db.models.deletion.Collector.collect, which calls get_del_batches to split a list of objects into manageable chunks for deletion. This was done in #16426.
If there are keys on other models referring to the model being deleted, and those keys have an on_delete setting that would cause an update to that field (such as SET_NULL), the Collector's field_updates list will gather per-chunk updates to each such field. However, Collector.delete then combines the QuerySets generated for each update into a single combined filter. If there are more rows in total than the parameter limit then this will fail.
It seems straightforward to stop doing that:
@@ -481,9 +481,8 @@ class Collector: updates.append(instances) else: objs.extend(instances) - if updates: - combined_updates = reduce(or_, updates) - combined_updates.update(**{field.name: value}) + for update in updates: + update.update(**{field.name: value}) if objs: model = objs[0].__class__ query = sql.UpdateQuery(model)
However:
- I'm not certain what the original intent was behind combining the updates here. Can there be multiple updates for some other reason than batch chunking, that we want to minimise queries for? This happened in #33928.
- Testing it is a bit tricky since the SQLite parameter limit is now often higher than the traditional 999, and Python's sqlite3 doesn't let us read or change it. On my Ubuntu box here it's 250000, which is a bit more than is comfortable to be doing in a test. We can count queries like DeletionTests.test_large_delete does but that's not _exactly_ checking the thing we actually care about, that the number of parameters is within bounds. (It's not possible at present to read the number of parameters used from the queries_log.)
(Diversion: actually in practice I'm personally affected more by this issue in mssql-django, but there are a bunch of confounding factors around that backend's prior wonky attempts to work around the parameter limit on its own. https://github.com/microsoft/mssql-django/issues/156 has some background.)
Change History (8)
comment:1 by , 4 months ago
comment:2 by , 4 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
The analysis and the patch here are on point (thanks bobince!) so I'll accept and mark with has_patch.
The only thing I'd change is the test so bulk_batch_size
is mocked to return a small number (e.g. 10) and not only test the number of executed queries but count the number of parameters for each queries (you can use self.assertNumQueries
as a context manager for that).
That would avoid having to create 200s objects unnecessarily but more importantly test the chunking logic in a backend agnostic way.
comment:3 by , 4 months ago
The only thing I'd change is the test so bulk_batch_size is mocked to return a small number
will do!
count the number of parameters for each queries
Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s
being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.
I guess I could scrape the SQL string for number of integers but this feels pretty fragile. Or I could add a "params_count" entry to the dicts in queries_log, but that's more intrusive than I'd like just for the benefit of one test.
comment:4 by , 4 months ago
Yeah that's what I wanted to do, but was frustrated: connection.queries_log/assertNumQueries doesn't keep hold of the number of parameters, and you can't infer them from %s being in the SQL string because DebugCursorWrapper uses last_executed_query to replace them with parameter values.
Crap, I forgot that I ran into the exact same thing so many times and wished it would capture uninterpolated SQL and params intead. Never mind in this case.
comment:5 by , 4 months ago
Cc: | added |
---|
comment:6 by , 4 months ago
I guess I could scrape the SQL string for number of integers but this feels pretty fragile.
Have tried something similar to this (counting commas in ‘IN’ clauses) and actually maybe it's not thaaat bad. I've written worse.
Now I have a test that's only counting parameters not queries, I tried using the existing test model ‘R’ instead of making a new one. This turned up an delightful wrinkle in that the fast-delete path can also ‘OR’ together multiple querysets, this time when there are multiple references on one model to rows being deleted. For example if Article.author and Article.editor are ForeignKeys to Person with on_delete=CASCADE, and 500 Person rows are deleted, deletion will put together a query like DELETE FROM article WHERE author IN (...500 params...) OR editor IN (...another 500 params...)
and if you have a DBMS with a param limit of 999 you're now sad.
So, er, back to simpler relations for now. I don't think there's a straightforward fix to this — it would take a more wide-ranging rethink of how param limits impact batch sizes and I don't really have a good idea what that would look like. Ugh, I wish param limits would go away
comment:7 by , 3 months ago
Thanks for reporting your findings bobince. We are aware that our current params limiting is far from bullet proof.
The problem was recently re-surfaced when we added support for composite primary key as they require that the chunk size be divided by the number of fields included in the relationship.
Some related tickets are #36143, #36144.
Ugh, I wish param limits would go away
FYI you might be interested to know that SQLite 3.32+ seems to default to a much larger number.
comment:8 by , 4 weeks ago
Cc: | added |
---|
I also noticed this issue when I was testing https://github.com/django/django-docker-box/pull/50#discussion_r1933066031 with a low enough value for SQLITE_MAX_VARIABLE_NUMBER
. Some tests that do ContentType.objects.all().delete()
would fail with a similar error, and I think I tried hacking a similar solution to yours but I wasn't sure if that was the right approach as I'm not familiar with the deletion code.
Done this here FWIW: https://github.com/django/django/compare/main...bobince:django:batch_delete_with_set_null?expand=1