Opened 12 years ago

Closed 11 years ago

Last modified 10 years ago

#18984 closed Bug (fixed)

TransactionTestCase._fixture_teardown locks under mysql

Reported by: Jeremy Dunck Owned by: Aymeric Augustin
Component: Testing framework Version: dev
Severity: Release blocker Keywords:
Cc: btimby@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using multiple aliases to the same DB (either via TEST_MIRROR or simply different aliases with the same settings), it is possible to get 2 pending transactions because each DB alias gets its own connection to the DB.

The change in f758bdab moves DB flush to the end of the test case -- this is nice because it allows test case reordering and related optimizations.

Unfortunately, moving it to the end also means that the DB connection states aren't as clean as they were at the beginning of the testcase, so it's possible that a pending transaction on one alias blocks the progress of the flush command.

Attachments (3)

18984-deadlock-test-case.diff (1.2 KB ) - added by Aymeric Augustin 11 years ago.
18984-deadlock-test-case-without-multi-db.diff (1.3 KB ) - added by Aymeric Augustin 11 years ago.
18984-fix-without-multi-db.diff (834 bytes ) - added by Aymeric Augustin 11 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 by Jeremy Dunck, 12 years ago

To be clear, when this happens, it locks the app's test suite from completing.

comment:3 by Anssi Kääriäinen, 11 years ago

If two aliases point to the same database wouldn't it be enough to run the setup and teardown code for only one of the aliases?

comment:4 by Aymeric Augustin, 11 years ago

If I understand correctly, this only happens with django.test.TransactionTestCase and when multi_db = True (with django.test.TestCase tests are wrapped in a transaction that is rolled back at the end of the test).

To reproduce the bug, one should:

  • create two aliases pointing to the same database;
  • create a multi-db transaction test case;
  • with each connection, open a transaction and makes a write (actually, if I check the order of connection, I can only do this on the second one).

Then _fixture_teardown will attempt to flush that table using the first connection, triggering a lock on the write with the second connection.


The more general problem is "should Django's test runner protect users against tests that leave pending transactions?". Your answer is "yes".

The alternative would be to document that transaction test cases must not leave pending transactions behind themselves. That would mean putting try / finally clauses everywhere -- not nice. Since the fix is quite simple I agree with you.


I considered adding a test that triggers the bug to the test suite. If Django had a regression, the test runner would then hang in teardown, without any hint as to what's happening. That doesn't sound useful. Therefore, I think this falls under "the test runner can't test itself."

If you can confirm that I understood the problem correctly, I will reproduce it manually and commit your fix.

comment:5 by Aymeric Augustin, 11 years ago

In fact, when TEST_MIRROR is set, even an empty test triggers the deadlock! See attached patch.

Running the test with MySQL deadlocks. It dies after the third Ctrl-C with the trace below. This is reproducible.

(django-dev)myk@mYk tests % PYTHONPATH=.. ./runtests.py --settings=test_mysql_local test_runner.MirrorDeadlockTest                       ~/Documents/dev/django/tests
Creating test database for alias 'default'...
Creating test database for alias 'other'...
You're going to deadlock, human. MUHAHA.
.^C^C^CE
======================================================================
ERROR: test_teardown_deadlock (regressiontests.test_runner.tests.MirrorDeadlockTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/myk/Documents/dev/django/django/test/testcases.py", line 478, in __call__
    self._post_teardown()
  File "/Users/myk/Documents/dev/django/django/test/testcases.py", line 492, in _post_teardown
    self._fixture_teardown()
  File "/Users/myk/Documents/dev/django/django/test/testcases.py", line 517, in _fixture_teardown
    skip_validation=True, reset_sequences=False)
  File "/Users/myk/Documents/dev/django/django/core/management/__init__.py", line 160, in call_command
    return klass.execute(*args, **defaults)
  File "/Users/myk/Documents/dev/django/django/core/management/base.py", line 252, in execute
    output = self.handle(*args, **options)
  File "/Users/myk/Documents/dev/django/django/core/management/base.py", line 382, in handle
    return self.handle_noargs(**options)
  File "/Users/myk/Documents/dev/django/django/core/management/commands/flush.py", line 64, in handle_noargs
    transaction.rollback_unless_managed(using=db)
  File "/Users/myk/Documents/dev/django/django/db/transaction.py", line 128, in rollback_unless_managed
    connection.rollback_unless_managed()
  File "/Users/myk/Documents/dev/django/django/db/backends/__init__.py", line 222, in rollback_unless_managed
    self._rollback()
  File "/Users/myk/Documents/dev/django/django/db/backends/mysql/base.py", line 420, in _rollback
    BaseDatabaseWrapper._rollback(self)
  File "/Users/myk/Documents/dev/django/django/db/backends/__init__.py", line 58, in _rollback
    return self.connection.rollback()
OperationalError: (2006, 'MySQL server has gone away')

----------------------------------------------------------------------
Ran 1 test in 3.969s

FAILED (errors=1)
Destroying test database for alias 'default'...
Destroying test database for alias 'other'...

Running the test with PostgreSQL also deadlocks, and refuses to die with Ctrl-C, so I don't have a stack trace.

During the deadlock, ps shows:

postgres: django test_django [local] TRUNCATE TABLE waiting  
postgres: django test_django [local] idle in transaction  
Last edited 11 years ago by Aymeric Augustin (previous) (diff)

comment:6 by Jeremy Dunck, 11 years ago

I agree with your description. I suppose my use is unusual in wanting to do multi-db test cases. :)

comment:7 by Aymeric Augustin, 11 years ago

Owner: changed from nobody to Aymeric Augustin

by Aymeric Augustin, 11 years ago

comment:8 by Aymeric Augustin, 11 years ago

I haven't nailed the regression: running the same test under 1.4 deadlocks before running any tests. TEST_MIRROR and multi_db = True can't work together.

The report mentions TEST_MIRROR; let's keep this hypothesis. The next step is to reproduce the problem without multi_db = True.

Last edited 11 years ago by Aymeric Augustin (previous) (diff)

by Aymeric Augustin, 11 years ago

by Aymeric Augustin, 11 years ago

comment:9 by Aymeric Augustin, 11 years ago

I finally managed to reproduce the problem with TransactionTestCase and TEST_MIRROR but without multi_db = True.

In order to create a pending transaction, I made a raw SQL query and didn't call commit_unless_managed(). This is obviously incorrect, but I can imagine this sequence happening accidentally — for instance if an exception is raised in a test before calling commit_unless_managed().

I've attached a test case as a patch, but I don't intend to commit it, for the reasons explained above.


To fix the problem, I had to rollback_unless_managed() all connections, and not only the default connection like the pull request does, because multi_db = False. I think the PR worked for Jeremy's case because he only had a pending transaction on the default connection. If so, rolling back the transaction on the default connection was enough.

Of course, if I had set multi_db = True, the PR would have rolled back the transactions on all databases. But currently that doesn't work at all. I think it's best to avoid deadlocks even in the case of programming mistakes, and calling rollback_unless_managed() is inexpensive when the connection hasn't been used.

I've also attached a patch that implements this solution.


I propose to commit this patch and open a new ticket for the TransactionTestCase + TEST_MIRROR + multi_db = True combination.

EDIT: new ticket opened: #19343

Last edited 11 years ago by Aymeric Augustin (previous) (diff)

comment:10 by Aymeric Augustin, 11 years ago

Triage Stage: AcceptedReady for checkin

comment:11 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 891c5306242760c8bb494412b9f8f86762664427:

Fixed #18984 -- Avoided a deadlock in test teardown.

Thanks Jeremy Dunck for the report.

comment:12 by Aymeric Augustin <aymeric.augustin@…>, 11 years ago

Resolution: fixed
Status: newclosed

In b9f9bc9a170430901fae188c375076f9b2cac784:

[1.5.x] Fixed #18984 -- Avoided a deadlock in test teardown.

Thanks Jeremy Dunck for the report.

Backport of 891c530 from master.

comment:13 by btimby, 10 years ago

Cc: btimby@… added

I am getting the same condition when running a unit test that uses multiple threads. I have a server running in another thread, it is stopped in tearDown() yet in _fixture_teardown() during the flush, it hangs.

show processlist:

| 126 | camden | localhost | test_camden | Sleep   |   69 |                                 | NULL                   |
| 128 | camden | localhost | test_camden | Query   |    2 | Waiting for table metadata lock | TRUNCATE `main_camera` |

I am using Django 1.5.4 and verified that the patch is present in the version I am running. At the time that rollback_unless_managed() is run, only the first sleeping connection is present, a new one is opened for the flush.

I am not sure if this should be reopened or what other information is necessary to reopen this issue, so I will leave the status as-is and just provide the above comment for now.

comment:14 by Tim Graham, 10 years ago

Transaction management has been overhauled in Django 1.6. If you can reproduce this issue there, please open a new ticket with details.

Note: See TracTickets for help on using tickets.
Back to Top