Opened 3 days ago
Last modified 32 hours ago
#36213 new Cleanup/optimization
Document that QuerySet.update can execute two separate SQL queries when using MySQL
Reported by: | Babak | Owned by: | |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | mysql self-select race-condition |
Cc: | Simon Charette, John Speno | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
When you have a QuerySet that has filter conditions based on related tables, the QuerySet.update()
function will execute two separate SQL queries (a SELECT
, followed by an UPDATE
).
Examples:
BlogPost.objects.filter(published=True).update(foo="bar")
(Causes a single UPDATE
)
BlogPost.objects.filter(published=True, author__name="Foo Bar").update(foo="bar")
(Causes SELECT
THEN UPDATE
)
As I was told in the forum (thanks charettes), this is an undocumented MySQL-only behaviour because MySQL doesn't allow self-select updates.
This will silently cause nasty race conditions since the update is no longer running as a single SQL statement.
Currently the docs for QuerySet.update
say:
Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save().
But in the case that I described, it causes the exact same type of race condition that the docs suggest that it prevents.
If the users are aware of this behaviour they can take care to avoid such filter conditions or to use alternative transaction handling mechanisms to ensure atomic behaviour.
I'd like to suggest for a warning to be added to the documentation about this.
Change History (8)
comment:1 by , 3 days ago
Description: | modified (diff) |
---|
comment:3 by , 3 days ago
Cc: | added |
---|---|
Summary: | Add warning to documentation: QuerySet.update can execute two separate SQL queries when using MySQL → Document that QuerySet.update can execute two separate SQL queries when using MySQL |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Version: | → dev |
comment:4 by , 3 days ago
This seems like it's the same thing: https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/
We also had a user on the unofficial django discord asking about a similar situation where their process got killed by the OOM killer when doing an update(). Also on mysql.
comment:5 by , 3 days ago
Cc: | added |
---|
comment:6 by , 3 days ago
This seems like it's the same thing: https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/
Unless they are making use of on_delete=models.SET_NULL
or models.SET
which make use of update
it's not the same thing.
comment:7 by , 35 hours ago
In stand corrected, QuerySet.delete
on MySQL also performs a materialization but at the database level.
I wonder if the same strategy could be used for the UPDATE
.
comment:8 by , 32 hours ago
I confirmed that the Python side materialization of the query can be entirely avoided by using the same strategy as #31965 (f6405c0b8ef7aff513b105c1da68407a881a3671).
While MySQL disallows UPDATE table ... WHERE table.id IN (SELECT id FROM table WHERE ...)
it allows UPDATE table ... WHERE table.id IN (SELECT * FROM (SELECT id FROM table WHERE ...) subquery)
as it materialize the subqueries remotely.
I also learned that MySQL UPDATE
supports updating multiple table at a time which is wild so I figured I'd give a shot making use of it.
Not sure if this changes the outcome of this ticket but the first two commits of this branch seem to address the issue entirely by
- In cases where the subquer must be materialized favor doing so on the server side (same strategy as #31965).
- In cases where the
UPDATE FROM
syntax can be used withJOIN
s to avoid the materialization of a subquery (which MySQL is notably bad at) adopt a similar strategy to #23576 (7acef095d73322f45dcceb99afa1a4e50b520479). That's possible when no aggregation or related updates are used.
The last commit is complementary but demonstrate an optimization for MTI updates of the form
Child.objects.filter(child_field=0).update(parent_field=1, child_field=2)
That are currently implemented through via N + 1 queries on all backends (A first select followed by N updates)
SELECT parent_prt_id FROM child WHERE child_field = 0 UPDATE child SET child_field = 1 WHERE parent_prt_id IN :parent_prt_ids UPDATE parent SET parent_field = 2 WHERE id IN :parent_prt_ids
But are replaced by the following on MySQL
UPDATE child JOIN parent ON (child.parent_ptr_id = parent.id) SET child.child_field = 1, parent.parent_field = 2 WHERE child_field = 0
Note that the optimization must be disabled if order_by
is used against inherited fields as MySQL doesn't allow references to JOIN
'ed table in UPDATE ORDER BY
(the join clause would have to be altered to be an ordered subquery).
Here's a draft that I wrote. Happy to submit a PR if it's appropriate.