Opened 3 days ago

Last modified 31 hours ago

#36213 new Cleanup/optimization

Document that QuerySet.update can execute two separate SQL queries when using MySQL

Reported by: Babak Owned by:
Component: Documentation Version: dev
Severity: Normal Keywords: mysql self-select race-condition
Cc: Simon Charette, John Speno Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Babak)

When you have a QuerySet that has filter conditions based on related tables, the QuerySet.update() function will execute two separate SQL queries (a SELECT, followed by an UPDATE).

Examples:

BlogPost.objects.filter(published=True).update(foo="bar") (Causes a single UPDATE)

BlogPost.objects.filter(published=True, author__name="Foo Bar").update(foo="bar") (Causes SELECT THEN UPDATE)

As I was told in the forum (thanks charettes), this is an undocumented MySQL-only behaviour because MySQL doesn't allow self-select updates.

This will silently cause nasty race conditions since the update is no longer running as a single SQL statement.

Currently the docs for QuerySet.update say:

Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save().

But in the case that I described, it causes the exact same type of race condition that the docs suggest that it prevents.

If the users are aware of this behaviour they can take care to avoid such filter conditions or to use alternative transaction handling mechanisms to ensure atomic behaviour.

I'd like to suggest for a warning to be added to the documentation about this.

Change History (8)

comment:1 by Babak, 3 days ago

Description: modified (diff)

comment:2 by Babak, 3 days ago

Here's a draft that I wrote. Happy to submit a PR if it's appropriate.

Last edited 3 days ago by Babak (previous) (diff)

comment:3 by Sarah Boyce, 3 days ago

Cc: Simon Charette added
Summary: Add warning to documentation: QuerySet.update can execute two separate SQL queries when using MySQLDocument that QuerySet.update can execute two separate SQL queries when using MySQL
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: dev

comment:4 by Anders Hovmöller, 3 days ago

This seems like it's the same thing: https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/

We also had a user on the unofficial django discord asking about a similar situation where their process got killed by the OOM killer when doing an update(). Also on mysql.

comment:5 by John Speno, 3 days ago

Cc: John Speno added

comment:6 by Simon Charette, 3 days ago

This seems like it's the same thing: ​https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/

Unless they are making use of on_delete=models.SET_NULL or models.SET which make use of update it's not the same thing.

comment:7 by Simon Charette, 35 hours ago

In stand corrected, QuerySet.delete on MySQL also performs a materialization but at the database level (it doesn't fetch all the ids on the Python side).

I wonder if the same strategy could be used for the UPDATE.

Last edited 35 hours ago by Simon Charette (previous) (diff)

comment:8 by Simon Charette, 31 hours ago

I confirmed that the Python side materialization of the query can be entirely avoided by using the same strategy as #31965 (f6405c0b8ef7aff513b105c1da68407a881a3671).

While MySQL disallows UPDATE table ... WHERE table.id IN (SELECT id FROM table WHERE ...) it allows UPDATE table ... WHERE table.id IN (SELECT * FROM (SELECT id FROM table WHERE ...) subquery) as it materialize the subqueries remotely.

I also learned that MySQL UPDATE supports updating multiple table at a time which is wild so I figured I'd give a shot making use of it.

Not sure if this changes the outcome of this ticket but the first two commits of this branch seem to address the issue entirely by

  1. In cases where the subquer must be materialized favor doing so on the server side (same strategy as #31965).
  2. In cases where the UPDATE FROM syntax can be used with JOINs to avoid the materialization of a subquery (which MySQL is notably bad at) adopt a similar strategy to #23576 (7acef095d73322f45dcceb99afa1a4e50b520479). That's possible when no aggregation or related updates are used.

The last commit is complementary but demonstrate an optimization for MTI updates of the form

Child.objects.filter(child_field=0).update(parent_field=1, child_field=2)

That are currently implemented through two N + 1 queries on all backends

SELECT parent_prt_id FROM child WHERE child_field = 0
UPDATE child SET child_field = 1 WHERE parent_prt_id IN :parent_prt_ids
UPDATE parent SET parent_field = 2 WHERE id IN :parent_prt_ids

But are replaced by the following on MySQL

UPDATE child
JOIN parent ON (child.parent_ptr_id = parent.id)
SET child.child_field = 1, parent.parent_field = 2
WHERE child_field = 0

Note that the optimization must be disabled if order_by is used against inherited fields as MySQL doesn't allow references to JOIN'ed table in UPDATE ORDER BY (the join clause would have to be altered to be an ordered subquery).

Version 1, edited 31 hours ago by Simon Charette (previous) (next) (diff)
Note: See TracTickets for help on using tickets.
Back to Top