Opened 5 weeks ago

Closed 12 hours ago

Last modified 11 hours ago

#36213 closed Cleanup/optimization (fixed)

Document that QuerySet.update can execute two separate SQL queries when using MySQL

Reported by: Babak Owned by: Babak
Component: Documentation Version: dev
Severity: Normal Keywords: mysql self-select race-condition
Cc: Simon Charette, John Speno Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no
Pull Requests:19312 merged, 19234 unmerged

Description (last modified by Babak)

When you have a QuerySet that has filter conditions based on related tables, the QuerySet.update() function will execute two separate SQL queries (a SELECT, followed by an UPDATE).

Examples:

BlogPost.objects.filter(published=True).update(foo="bar") (Causes a single UPDATE)

BlogPost.objects.filter(published=True, author__name="Foo Bar").update(foo="bar") (Causes SELECT THEN UPDATE)

As I was told in the forum (thanks charettes), this is an undocumented MySQL-only behaviour because MySQL doesn't allow self-select updates.

This will silently cause nasty race conditions since the update is no longer running as a single SQL statement.

Currently the docs for QuerySet.update say:

Using update() also prevents a race condition wherein something might change in your database in the short period of time between loading the object and calling save().

But in the case that I described, it causes the exact same type of race condition that the docs suggest that it prevents.

If the users are aware of this behaviour they can take care to avoid such filter conditions or to use alternative transaction handling mechanisms to ensure atomic behaviour.

I'd like to suggest for a warning to be added to the documentation about this.

Change History (17)

comment:1 by Babak, 5 weeks ago

Description: modified (diff)

comment:2 by Babak, 5 weeks ago

Here's a draft that I wrote. Happy to submit a PR if it's appropriate.

Last edited 5 weeks ago by Babak (previous) (diff)

comment:3 by Sarah Boyce, 5 weeks ago

Cc: Simon Charette added
Summary: Add warning to documentation: QuerySet.update can execute two separate SQL queries when using MySQLDocument that QuerySet.update can execute two separate SQL queries when using MySQL
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization
Version: dev

comment:4 by Anders Hovmöller, 5 weeks ago

This seems like it's the same thing: https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/

We also had a user on the unofficial django discord asking about a similar situation where their process got killed by the OOM killer when doing an update(). Also on mysql.

comment:5 by John Speno, 5 weeks ago

Cc: John Speno added

comment:6 by Simon Charette, 5 weeks ago

This seems like it's the same thing: ​https://www.reddit.com/r/django/comments/1ixb489/high_memory_usage_on_delete/

Unless they are making use of on_delete=models.SET_NULL or models.SET which make use of update it's not the same thing.

comment:7 by Simon Charette, 5 weeks ago

In stand corrected, QuerySet.delete on MySQL also performs a materialization but at the database level (it doesn't fetch all the ids on the Python side).

I wonder if the same strategy could be used for the UPDATE.

Last edited 5 weeks ago by Simon Charette (previous) (diff)

comment:8 by Simon Charette, 5 weeks ago

I confirmed that the Python side materialization of the query can be entirely avoided by using the same strategy as #31965 (f6405c0b8ef7aff513b105c1da68407a881a3671).

While MySQL disallows UPDATE table ... WHERE table.id IN (SELECT id FROM table WHERE ...) it allows UPDATE table ... WHERE table.id IN (SELECT * FROM (SELECT id FROM table WHERE ...) subquery) as it materialize the subqueries remotely.

I also learned that MySQL UPDATE supports updating multiple table at a time which is wild so I figured I'd give a shot making use of it.

Not sure if this changes the outcome of this ticket but the first two commits of this branch seem to address the issue entirely by

  1. In cases where the subquer must be materialized favor doing so on the server side (same strategy as #31965).
  2. In cases where the UPDATE FROM syntax can be used with JOINs to avoid the materialization of a subquery (which MySQL is notably bad at) adopt a similar strategy to #23576 (7acef095d73322f45dcceb99afa1a4e50b520479). That's possible when no aggregation or related updates are used.

The last commit is complementary but demonstrate an optimization for MTI updates of the form

Child.objects.filter(child_field=0).update(parent_field=1, child_field=2)

That are currently implemented through via N + 1 queries on all backends (A first select followed by N updates)

SELECT parent_prt_id FROM child WHERE child_field = 0
UPDATE child SET child_field = 1 WHERE parent_prt_id IN :parent_prt_ids
UPDATE parent SET parent_field = 2 WHERE id IN :parent_prt_ids

But are replaced by the following on MySQL

UPDATE child
JOIN parent ON (child.parent_ptr_id = parent.id)
SET child.child_field = 1, parent.parent_field = 2
WHERE child_field = 0

Note that the optimization must be disabled if order_by is used against inherited fields as MySQL doesn't allow references to JOIN'ed table in UPDATE ORDER BY (the join clause would have to be altered to be an ordered subquery).

Last edited 5 weeks ago by Simon Charette (previous) (diff)

comment:9 by edc, 4 weeks ago

Owner: set to edc
Status: newassigned

comment:10 by edc, 4 weeks ago

Has patch: set

comment:11 by Jacob Walls, 2 weeks ago

Patch needs improvement: set

comment:12 by Jacob Walls, 5 days ago

Owner: changed from edc to Babak
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:13 by Sarah Boyce, 34 hours ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:14 by Sarah Boyce, 31 hours ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:15 by Sarah Boyce <42296566+sarahboyce@…>, 12 hours ago

Resolution: fixed
Status: assignedclosed

In be1b776:

Fixed #36213 -- Doc'd MySQL's handling of self-select updates in QuerySet.update().

Co-authored-by: Andro Ranogajec <ranogaet@…>

comment:16 by Sarah Boyce <42296566+sarahboyce@…>, 12 hours ago

In c68f3516:

[5.2.x] Fixed #36213 -- Doc'd MySQL's handling of self-select updates in QuerySet.update().

Co-authored-by: Andro Ranogajec <ranogaet@…>

Backport of be1b776ad8d6f9bccfbdf63f84b16fb81a13119e from main.

comment:17 by Sarah Boyce <42296566+sarahboyce@…>, 11 hours ago

In b3b09dc6:

[5.1.x] Fixed #36213 -- Doc'd MySQL's handling of self-select updates in QuerySet.update().

Co-authored-by: Andro Ranogajec <ranogaet@…>

Backport of be1b776ad8d6f9bccfbdf63f84b16fb81a13119e from main.

Note: See TracTickets for help on using tickets.
Back to Top