#36213 closed Cleanup/optimization (fixed)
Document that QuerySet.update can execute two separate SQL queries when using MySQL
Description (last modified by ) ¶
When you have a QuerySet that has filter conditions based on related tables, the QuerySet.update()
function will execute two separate SQL queries (a SELECT
, followed by an UPDATE
).
Examples:
BlogPost.objects.filter(published=True).update(foo="bar")
(Causes a single UPDATE
)
BlogPost.objects.filter(published=True, author__name="Foo Bar").update(foo="bar")
(Causes SELECT
THEN UPDATE
)
As I was told in the forum (thanks charettes), this is an undocumented MySQL-only behaviour because MySQL doesn't allow self-select updates.
This will silently cause nasty race conditions since the update is no longer running as a single SQL statement.
Currently the docs for QuerySet.update
say:
Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save().
But in the case that I described, it causes the exact same type of race condition that the docs suggest that it prevents.
If the users are aware of this behaviour they can take care to avoid such filter conditions or to use alternative transaction handling mechanisms to ensure atomic behaviour.
I'd like to suggest for a warning to be added to the documentation about this.
Change History (17)
comment:1 by , 5 weeks ago
Description: | modified (diff) |
---|
comment:3 by , 5 weeks ago
Cc: | added |
---|---|
Summary: | Add warning to documentation: QuerySet.update can execute two separate SQL queries when using MySQL → Document that QuerySet.update can execute two separate SQL queries when using MySQL |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Version: | → dev |
comment:4 by , 5 weeks ago
This seems like it's the same thing: https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/
We also had a user on the unofficial django discord asking about a similar situation where their process got killed by the OOM killer when doing an update(). Also on mysql.
comment:5 by , 5 weeks ago
Cc: | added |
---|
comment:6 by , 5 weeks ago
This seems like it's the same thing: https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/
Unless they are making use of on_delete=models.SET_NULL
or models.SET
which make use of update
it's not the same thing.
comment:7 by , 5 weeks ago
In stand corrected, QuerySet.delete
on MySQL also performs a materialization but at the database level (it doesn't fetch all the ids on the Python side).
I wonder if the same strategy could be used for the UPDATE
.
comment:8 by , 5 weeks ago
I confirmed that the Python side materialization of the query can be entirely avoided by using the same strategy as #31965 (f6405c0b8ef7aff513b105c1da68407a881a3671).
While MySQL disallows UPDATE table ... WHERE table.id IN (SELECT id FROM table WHERE ...)
it allows UPDATE table ... WHERE table.id IN (SELECT * FROM (SELECT id FROM table WHERE ...) subquery)
as it materialize the subqueries remotely.
I also learned that MySQL UPDATE
supports updating multiple table at a time which is wild so I figured I'd give a shot making use of it.
Not sure if this changes the outcome of this ticket but the first two commits of this branch seem to address the issue entirely by
- In cases where the subquer must be materialized favor doing so on the server side (same strategy as #31965).
- In cases where the
UPDATE FROM
syntax can be used withJOIN
s to avoid the materialization of a subquery (which MySQL is notably bad at) adopt a similar strategy to #23576 (7acef095d73322f45dcceb99afa1a4e50b520479). That's possible when no aggregation or related updates are used.
The last commit is complementary but demonstrate an optimization for MTI updates of the form
Child.objects.filter(child_field=0).update(parent_field=1, child_field=2)
That are currently implemented through via N + 1 queries on all backends (A first select followed by N updates)
SELECT parent_prt_id FROM child WHERE child_field = 0 UPDATE child SET child_field = 1 WHERE parent_prt_id IN :parent_prt_ids UPDATE parent SET parent_field = 2 WHERE id IN :parent_prt_ids
But are replaced by the following on MySQL
UPDATE child JOIN parent ON (child.parent_ptr_id = parent.id) SET child.child_field = 1, parent.parent_field = 2 WHERE child_field = 0
Note that the optimization must be disabled if order_by
is used against inherited fields as MySQL doesn't allow references to JOIN
'ed table in UPDATE ORDER BY
(the join clause would have to be altered to be an ordered subquery).
comment:9 by , 4 weeks ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:10 by , 4 weeks ago
Has patch: | set |
---|
comment:11 by , 2 weeks ago
Patch needs improvement: | set |
---|
comment:12 by , 5 days ago
Owner: | changed from | to
---|---|
Patch needs improvement: | unset |
Triage Stage: | Accepted → Ready for checkin |
comment:13 by , 34 hours ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
comment:14 by , 31 hours ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
Here's a draft that I wrote. Happy to submit a PR if it's appropriate.