Opened 9 years ago

Closed 8 years ago

#26347 closed Cleanup/optimization (wontfix)

Saving ManyToMany field under race condition causes data loss on MySQL

Reported by: Hugo Chargois Owned by: nobody
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords: mysql transaction
Cc: django@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I had to investigate mysteriously disappearing contents of some ManyToManyFields of some Model. Everyone who had access to the admin site assured me that one day the contents were there, and the next day they weren't anymore, even though they didn't touch those fields. Of course, we had absolutely no custom code to edit those fields, and it looks like they didn't lie, nothing showed up in the admin history. I was stumped.

I found that they were erased whenever someone did a double-click on the "Save" button (unintentionally, hopefully).

It took me some time to really identify the root of the problem...

Looking at the database log, I found that on each save, even if the field is untouched, this happens (a bit simplified):

DELETE * FROM relation_table WHERE from_id = <our_object_id>;
SELECT * FROM relation_table WHERE from_id = <our_object_id> and to_id in (<related1_id>, <related2_id>,... );
INSERT INTO relation_table (from_id, to_id) VALUES (<our_object_id>, <related1_id>), (<our_object_id>, <related2_id>), ...;

The DELETE then INSERT behavior is visible here: https://github.com/django/django/blob/1.8.11/django/db/models/fields/related.py#L1271

And the SELECT in between spawns from the manager.add, from this code that checks that we only insert what is not already present:
https://github.com/django/django/blob/1.8.11/django/db/models/fields/related.py#L1090

So, yeah, deleting everything, selecting from what we just deleted (that should always be nothing, right?) and inserting back the same things is a really weird, unoptimized and dangerous way of doing nothing, but since it is all in a nice transaction, that should always work, right? Right?

Well, no. Not with MySQL of course.

Sometimes, a race condition makes it possible that the SELECT after the DELETE does return some old rows that really aren't there anymore. This fools the manager.add to think that it has nothing to INSERT since the rows are already there. But as soon as the transaction is finished, no rows are effectively there anymore.

This behavior of MySQL is called "consistent read" and is well documented: https://dev.mysql.com/doc/refman/5.5/en/innodb-consistent-read.html

Another bug report that arose due to the same MySQL "feature": #13906
I'm filing another report because that previous one seems to go nowhere and focuses on get_or_create(), whereas I'm showing here that this MySQL behavior can also cause very obscure data loss, and is thus IMHO of the utmost importance and should be fixed ASAP.

Attachments (2)

m2m_django_1.9 (5.3 KB ) - added by Hugo Chargois 9 years ago.
django_m2m_bug_step_by_step (1.2 KB ) - added by Hugo Chargois 9 years ago.

Download all attachments as: .zip

Change History (23)

comment:1 by Tim Graham, 9 years ago

Could you check if you can reproduce with Django 1.9? It might be resolved by the related set direct assignment change.

If it's fixed in 1.9, I'm not sure if there's much value in patching Django 1.8 as I assume this isn't a new issue and I'm not sure if it'd be feasible to fix it without an invasive patch.

by Hugo Chargois, 9 years ago

Attachment: m2m_django_1.9 added

comment:2 by Hugo Chargois, 9 years ago

I tried to reproduce with 1.9, I don't have any data loss anymore.

I get a new bug when I add (not when I don't change nor remove) objects to the M2M (see attachment).

I understand that this is a complicated bug, but it is a data-loss bug, that can be trivially triggered, maliciously or not, on a long-term release still supported for at least two years. I don't think it should be swept under the rug.

comment:3 by Tim Graham, 9 years ago

There's a recommendation in the get_or_create() docs about using the READ COMMITTED isolation level rather than REPEATABLE READ (the default) to avoid problems there. Not sure if a similar documentation note would be appropriate for this issue.

I've asked for advice on django-developers.

comment:4 by Chris Streeter, 9 years ago

Cc: django@… added

comment:5 by Shai Berger, 9 years ago

I've read the MySql documentation linked, and as far as I can understand it does not allow a case where old records are seen when they are modified (not to mention deleted) in the same transaction. That's not "consistent read".

For us to accept that there is a data-loss race-condition under consistent read, we'd need to see a detailed, step-by-step description of the conflicting transactions, and how they reach the final result. As far as I understand, there is no such scenario.

If a scenario as above cannot be given, then the data-loss bug is entirely MySql's, and no fix for Django 1.8 is warranted.

The 1.9 situation describes correct behavior on all parts except for the user interface, and I believe that is the level at which it should be solved (i.e. disabling submission once "save" has been clicked).

comment:6 by Markus Holtermann, 9 years ago

If there's a trivial solution for REPEATABLE READs (in case that's really the issue) I think we should go for that. Otherwise, or if the change is too invasive I'd rather go with a documentation update that mentions / recommends to use READ COMMITTED.

Since this is a solution that mostly affects the 1.8 release, which we maintain for a few more years, I'd go with a documentation fix and not play around with a UI fix.

comment:7 by Cristiano Coelho, 9 years ago

The UI fix would be too basic, the race condition (if really an issue) would be still there by using multiple users or even tabs...

comment:8 by Shai Berger, 9 years ago

I still say that, until proven otherwise, any existing race condition is in MySql, not Django. At this point I am inclined to close this issue as needsinfo.

comment:9 by Tim Graham, 9 years ago

Resolution: needsinfo
Status: newclosed

comment:10 by Hugo Chargois, 9 years ago

For us to accept that there is a data-loss race-condition under consistent read, we'd need to see a detailed, step-by-step description of the conflicting transactions, and how they reach the final result. As far as I understand, there is no such scenario.

Sure there is. I did not open this ticket for the fun of it. Please see attachment for the step-by-step. Sorry I didn't have time to produce it earlier.

The 1.9 situation describes correct behavior on all parts except for the user interface

I'm pretty sure an ORM causing and failing to correctly handle an "IntegrityError" has nothing to do with a user interface at all. But I think this is a different bug that should need its own ticket.

and I believe that is the level at which it should be solved (i.e. disabling submission once "save" has been clicked).

Then again, models are not saved only from the admin site. They may be save by a standard view, by an API view, by an automated script...

I'd go with a documentation fix

A "documentation fix" would only be appropriate if the "Supported databases" page said "Django does not support MySQL anymore due to possible data-loss under normal usage conditions".

An ORM shouldn't cause data loss. Ever. If it requires its backend database to be configured in a particular way, it should either configure it itself in that way (isolation levels can be set per transaction, right?) or if it cannot, it should display huge warnings in big bold red letters, each time the application starts. It shouldn't be buried deep in the documentation.

by Hugo Chargois, 9 years ago

Attachment: django_m2m_bug_step_by_step added

comment:11 by Hugo Chargois, 9 years ago

Resolution: needsinfo
Status: closednew

in reply to:  10 comment:12 by Shai Berger, 9 years ago

Resolution: wontfix
Status: newclosed

Replying to hchargois:

For us to accept that there is a data-loss race-condition under consistent read, we'd need to see a detailed, step-by-step description of the conflicting transactions, and how they reach the final result.

Please see attachment for the step-by-step. Sorry I didn't have time to produce it earlier.

Please take your attachment and use it in a critical bug you should open against MySql. The scenario you describe is not valid behavior under any transaction isolation level, and is specifically not consistent with the documentation you linked to:

  • Under "REPEATABLE READ", session B should never see the record inserted by Session A after Session B's first select (in lines 12-18). That is, the result in lines 35-41 is invalid because it violates the premise of "consistent read".
  • Under "READ COMMITTED" (or "REPEATABLE READ" but without the first select), session B's delete at line 32 deletes the record inserted by session A, and again the result in 35-41 is invalid, this time because it violates basic consistency.

The 1.9 situation describes correct behavior on all parts except for the user interface

I'm pretty sure an ORM causing and failing to correctly handle an "IntegrityError" has nothing to do with a user interface at all. But I think this is a different bug that should need its own ticket.

The ORM is not failing. Because of a user interface problem, two conflicting transactions were executed simultaneously. Only one of them succeeds; this is expected behavior as far as the data layer is concerned. Also, there is no data loss.

I'd go with a documentation fix

A "documentation fix" would only be appropriate if the "Supported databases" page said "Django does not support MySQL anymore due to possible data-loss under normal usage conditions".

The scenario you describe indeed justifies adding something like this, however it would not belong in Django's documentation, but in MySql's.

An ORM shouldn't cause data loss. Ever. If it requires its backend database to be configured in a particular way, it should either configure it itself in that way (isolation levels can be set per transaction, right?) or if it cannot, it should display huge warnings in big bold red letters, each time the application starts. It shouldn't be buried deep in the documentation.

s/ORM/DBMS/. As you may be aware, MySql will also happily and quietly chop off the end of a string if it is too long for the field you're trying to put it in. There's only so much Django can do.

I closed as "wontfix", rather than "invalid", because it seems you have come across an actual problem in a setting that Django claims to support; however, this is not a problem in Django (except if we take your suggestion seriously and declare that supporting MySql is a bug).

comment:13 by Hugo Chargois, 9 years ago

Resolution: wontfix
Status: closednew

Sorry to insist, but...

Please take your attachment and use it in a critical bug you should open against MySql. The scenario you describe is not valid behavior under any transaction isolation level, and is specifically not consistent with the documentation you linked to:

  • Under "REPEATABLE READ", session B should never see the record inserted by Session A after Session B's first select (in lines 12-18). That is, the result in lines 35-41 is invalid because it violates the premise of "consistent read".

It is valid behavior. It is consistent with the documentation.

There is one mistake in your reasoning. You say "session B should never see the record inserted by Session A". That's right, it doesn't. It sees the record that was there in the first place, before session A and B started. The record inserted by A would have id 2, whereas you can see that B is still seeing id 1. The record that's deleted by B is id 2. Since B didn't modify the id 1 row (it is A that deleted it), then B can still see it until the end of the transaction.

comment:14 by Shai Berger, 9 years ago

Resolution: wontfix
Status: newclosed

It is not consistent with the documentation because B is supposed to see the changes that it makes itself.

You are essentially claiming that, within B, a delete statement given a set of criteria sees one record, and a following select given the same criteria sees a different record. No. That's not "consistent read", not valid behavior, and not consistent with the documentation (which speaks about not seeing changes made by other transactions, not your own).

Further, per Django's contribution guide, please do not re-open a ticket that was marked "wontfix" by a core developer; you should bring it up on the DevelopersMailingList. Tim has already started a thread for it that you might join.

(actually, having looked at the related ticket, there is a one-line change that would probably work around the problem for Django's use: Change ​https://github.com/django/django/blob/1.8.11/django/db/models/fields/related.py#L1089 from })) to }).for_update()). You are welcome to try it, and a patch for that may even be accepted for 1.8 if it indeed helps to prevent a data-loss. MySql's behavior where updates see different data than selects is still unacceptable)

(edit: I had the wrong line number)

Last edited 9 years ago by Shai Berger (previous) (diff)

in reply to:  14 ; comment:15 by Hugo Chargois, 9 years ago

It is not consistent with the documentation because B is supposed to see the changes that it makes itself.

What changes are you talking about? It does see all the changes it makes. The only change it makes, is one DELETE statement. This DELETE statement deletes the row with id==2. Can it see a row with id==2 afterwards? No it cannot. That behavior is correct and agrees with the documentation.

It doesn't change the row with id==1, thus this row can be seen as it was at the start at the transaction. That behavior is correct and agrees with the documentation.

You are essentially claiming that, within B, a delete statement given a set of criteria sees one record, and a following select given the same criteria sees a different record. No. That's not "consistent read", not valid behavior, and not consistent with the documentation (which speaks about not seeing changes made by other transactions, not your own).

That is exactly what consistent read is, and that behavior is valid and is consistent with the documentation. Please read the documentation carefuly and give yourself some time to understand it. You may also read this bug report against MySQL that was closed as "not a bug": https://bugs.mysql.com/bug.php?id=21694

Further, per Django's contribution guide, please do not re-open a ticket that was marked "wontfix" by a core developer

Please do not close tickets that you do not understand.

Version 0, edited 9 years ago by Hugo Chargois (next)

in reply to:  15 ; comment:16 by Shai Berger, 9 years ago

Replying to hchargois:

It is not consistent with the documentation because B is supposed to see the changes that it makes itself.

What changes are you talking about? It does see all the changes it makes. The only change it makes, is one DELETE statement. This DELETE statement deletes the row with id==2. Can it see a row with id==2 afterwards? No it cannot. That behavior is correct and agrees with the documentation.

Funny. I don't see a DELETE FROM t WHERE id=2 in session B; I see a DELETE FROM t WHERE from_id=10. That statement should either remove all records with from_id=10 visible to session B, or fail.

I have read the MySql documentation again, and specifically the note you have now pointed out. I concede that the behavior we see is consistent with this note. I disagree -- vehemently -- with the claim that this behavior is reasonable or acceptable. It violates the C, I & D of ACID. People who accept that as valid behavior have no business writing a DBMS.

I have offered you above a one-line fix that I think will solve your issue (because select-for-update is considered DML). That fix has a chance of making it into 1.8.x, if you turn it into a proper patch, including a test and release notes, after some discussion on the DevelopersMailingList. Personally, I prefer to spend my free time on advocating against the use of broken-by-design products, rather than on working around their failures.

in reply to:  16 comment:17 by Hugo Chargois, 9 years ago

Replying to shaib:

I have read the MySql documentation again, and specifically the note you have now pointed out. I concede that the behavior we see is consistent with this note.

Great. Will you reopen this ticket so that someone may work on this critical issue, then?

I disagree -- vehemently -- with the claim that this behavior is reasonable or acceptable. It violates the C, I & D of ACID. People who accept that as valid behavior have no business writing a DBMS.
[...]
Personally, I prefer to spend my free time on advocating against the use of broken-by-design products, rather than on working around their failures.

Right. Maybe Django should drop support for MySQL then. The release note could go like this: "Django no longer supports MySQL, the most popular open-source database engine by far, because shaib, core developer, thinks it is broken-by-design and written by people who have no business writing a DBMS. Please pay for Oracle."

I have offered you above a one-line fix that I think will solve your issue (because select-for-update is considered DML). That fix has a chance of making it into 1.8.x, if you turn it into a proper patch, including a test and release notes, after some discussion on the DevelopersMailingList.

I may try this fix but I probably won't have time to turn it into a proper patch, and to be honest I'm not familiar enough with Django's ORM backend code to be confident with testing this.

comment:18 by Tim Graham, 9 years ago

Resolution: wontfix
Status: closednew
Summary: Saving ManyToMany field under race condition causes data lossSaving ManyToMany field under race condition causes data loss on MySQL
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

comment:19 by Claude Paroz, 9 years ago

hchargois, your tone is totally inappropriate on this ticket tracker. We can discuss things with passion, but personal attacks are clearly out of scope. Please try to be polite.

comment:20 by Hugo Chargois, 9 years ago

I was not trying to offend shaib, and I apologize if it was felt that way, but I had to fight a bit for this ticket to be accepted.

Trust me, I would have preferred not to have to, since this made me lose hours of my time (and of my employer's). But politeness goes both ways, and closing tickets of not-so-trivial bugs without a second opinion is not very "polite" in my book. Your own contribution guide says (https://docs.djangoproject.com/fr/1.9/internals/contributing/triaging-tickets/#closing-tickets)

If you do close a ticket, you should always make sure of the following:

  • Be certain that the issue is resolved.

and

wontfix
Used when a core developer decides that this request is not appropriate for consideration in Django. This is usually chosen after discussion in the django-developers mailing list.

There wasn't much of a discussion between developers since I produced the step-by-step.

comment:21 by Tim Graham, 8 years ago

Resolution: wontfix
Status: newclosed

Django 1.11 offers an option to switch MySQL's isolation level and Django's MySQL backend will use read committed by default in Django 2.0 (#27683). Give that and lack of activity on this ticket, I think we can close it.

Note: See TracTickets for help on using tickets.
Back to Top