Opened 11 years ago
Closed 11 years ago
#21134 closed Cleanup/optimization (fixed)
Documentation for transaction.atomic needs more explicit warning about catching DatabaseErrors
Reported by: | RichardOfWard | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.6-beta-1 |
Severity: | Release blocker | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Catching a database error inside transaction.atomic and then subsequently issuing more database commands before the end of the atomic block leads to a silent implicit rollback of the subsequent commands in Atomic's __exit__
, even when there is no current exception. When using PostrgeSQL an InternalError is raised by the db when these subsequent commands are attempted as PostgreSQL disallows this behaviour. MySQL and SQLite3 are less strict and so no error is raised, leading to confusing and hard to diagnose behaviour. The current documentation doesn't highlight this potential pitfall.
Attachments (1)
Change History (21)
comment:1 by , 11 years ago
Component: | Documentation → Database layer (models, ORM) |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
Version: | 1.5 → 1.6-beta-1 |
comment:2 by , 11 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
I attempted to write a patch for the behavior described above and in fact it can't be made to work reliable, because not every SQL query is wrapped is transaction.atomic(savepoint=False)
. I'm attaching the patch at the point I gave up, for future reference.
by , 11 years ago
Attachment: | 21134-failed-attempt.patch added |
---|
comment:3 by , 11 years ago
The committed documentation changes aren't about what I have been complaining about. I have said numerous times that *this is not about DatabaseErrors*. This is about any error that bubbles out of atomic(savepoint=False) block.
The problem is this:
r1 = Reporter.objects.create(first_name='foo', last_name='bar') with transaction.atomic(): r2 = Reporter(first_name='foo', last_name='bar2', id=r1.id) try: r2.save(force_insert=True) except IntegrityError: r2.save(force_update=True) self.assertEqual(Reporter.objects.get(pk=r1.pk).last_name, 'bar2')
The last assert fails. This is unexpected for anybody using a database where this pattern is allowed (as far as I know any db other than PostgreSQL). Note that there isn't any explicit use of atomic(savepoint=False) blocks. Also, the exception *does not need to be DatabaseError subclass at all*.
comment:4 by , 11 years ago
In practice there isn't any difference between "DatabaseErrors" and "errors that bubble out of atomic(savepoint=False)". The only non-fatal exceptions expected inside an atomic(savepoint=False) block in Django are DatabaseErrors.
After taking into account the warning I added, your example becomes:
r1 = Reporter.objects.create(first_name='foo', last_name='bar') with transaction.atomic(): r2 = Reporter(first_name='foo', last_name='bar2', id=r1.id) try: with transaction.atomic(): r2.save(force_insert=True) except IntegrityError: r2.save(force_update=True) self.assertEqual(Reporter.objects.get(pk=r1.pk).last_name, 'bar2')
which behaves correctly.
But it's unrealistic to hope that everyone will do that without complaining, especially people used to databases with a loose (non-atomic) transaction implementation.
comment:5 by , 11 years ago
Has patch: | set |
---|---|
Resolution: | fixed |
Status: | closed → new |
comment:6 by , 11 years ago
This pull request prevents running queries in atomic
blocks that are going to end with a rollback.
Since atomic
is a new API, this change looks like it's backwards compatible. Unfortunately, it isn't, because some features already take advantage of atomic
internally.
Specifically django.test.TestCase
wraps each test in an atomic
block. As a consequence, tests that ran queries after database errors will require modifications, even if the query is a rollback to a previous savepoint. That said:
- If it's a test for transactional behavior, it should be in a
TransactionTestCase
. Quite obviously high-level and low-level APIs cannot be mixed arbitrarily. - If the savepoint is used to recover from a database error, it's much easier to recover with an
atomic
block. This is how I fixed the few tests that were broken by this change in Django's test suite.
If we decide to go with this approach, I'll add this to the transactions docs or the release notes.
comment:7 by , 11 years ago
Patch needs improvement: | set |
---|---|
Status: | new → assigned |
This patch breaks two tests under PostgreSQL when running the full test suite — not in isolation:
====================================================================== ERROR: test_alter_unique_together (migrations.test_operations.OperationTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/myk/Documents/dev/django/tests/migrations/test_operations.py", line 256, in test_alter_unique_together operation.database_backwards("test_alunto", editor, new_state, project_state) File "/Users/myk/Documents/dev/django/django/db/migrations/operations/models.py", line 121, in database_backwards return self.database_forwards(app_label, schema_editor, from_state, to_state) File "/Users/myk/Documents/dev/django/django/db/migrations/operations/models.py", line 117, in database_forwards getattr(new_model._meta, "unique_together", set()), File "/Users/myk/Documents/dev/django/django/db/backends/schema.py", line 271, in alter_unique_together ", ".join(columns), ValueError: Found wrong number (0) of constraints for test_alunto_pony(pink, weight) ====================================================================== ERROR: test_unique_together (schema.tests.SchemaTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/myk/Documents/dev/django/tests/schema/tests.py", line 430, in test_unique_together [], File "/Users/myk/Documents/dev/django/django/db/backends/schema.py", line 271, in alter_unique_together ", ".join(columns), ValueError: Found wrong number (0) of constraints for schema_uniquetest(year, slug) ----------------------------------------------------------------------
It also breaks one test on MySQL, because it's the only database that can't defer constraints checks automatically:
====================================================================== ERROR: test_loaddata_error_message (fixtures.tests.FixtureLoadingTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/myk/Documents/dev/django/tests/fixtures/tests.py", line 324, in test_loaddata_error_message management.call_command('loaddata', 'invalid.json', verbosity=0) File "/Users/myk/Documents/dev/django/django/core/management/__init__.py", line 159, in call_command return klass.execute(*args, **defaults) File "/Users/myk/Documents/dev/django/django/core/management/base.py", line 289, in execute output = self.handle(*args, **options) File "/Users/myk/Documents/dev/django/django/core/management/commands/loaddata.py", line 55, in handle self.loaddata(fixture_labels) File "/Users/myk/Documents/dev/django/django/core/management/commands/loaddata.py", line 84, in loaddata self.load_label(fixture_label) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/Users/myk/Documents/dev/django/django/db/backends/__init__.py", line 417, in constraint_checks_disabled self.enable_constraint_checking() File "/Users/myk/Documents/dev/django/django/db/backends/mysql/base.py", line 490, in enable_constraint_checking self.cursor().execute('SET foreign_key_checks=1') File "/Users/myk/Documents/dev/django/django/db/backends/utils.py", line 47, in execute self.db.validate_no_broken_transaction() File "/Users/myk/Documents/dev/django/django/db/backends/__init__.py", line 367, in validate_no_broken_transaction "An error occurred in the current transaction. You can't " TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block. ----------------------------------------------------------------------
Under Oracle it appears to be fine (I'm not getting a clean run because my Oracle test setup is weird, but all the failures I see are explained by other factors).
comment:8 by , 11 years ago
Neither bisect nor pair for the migrations tests triggered an error here; bisecting schema is currently hard due to other errors.
comment:9 by , 11 years ago
I added a commit to the PR to fix loaddata on MySQL. It isn't great; better ideas welcome.
comment:10 by , 11 years ago
I just pulled the branch and tested against the most recent master and got no test failures. I just committed some test fixes to schema/migrations, so that might be helping.
comment:11 by , 11 years ago
This is getting interesting: I can reproduce the above failures with PostgreSQL on one machine but not on another. Both are Mac Book Pros with the latest OS X, Python and Postgres from MacPorts.
:(
comment:12 by , 11 years ago
I receive the same failure on master with postgres-9.3.0
The test fails because the constraint columns from introspection are returned in reversed order so it doesn't match the existing constraint in _constraint_names() https://github.com/django/django/blob/master/django/db/backends/schema.py#L720
e.g. [year, slug] == [slug, year] fails.
Even when using a new connection on the test database the order remains reversed.
However, when using an explicit collate it will return in the expected order.
The database was created with encoding/collation nl_NL.UTF-8
First query is equal to what introspection.get_constraints() uses.
The second uses COLLATE "C" and the last uses COLLATE "nk_NL".
So somewhere in the test suite postgres gets confused and switches collation to something odd?
test_django_default=# SELECT kc.constraint_name, kc.column_name, c.constraint_type, array(SELECT table_name::text || '.' || column_name::text FROM information_schema.constraint_column_usage WHERE constraint_name = kc.constraint_name) FROM information_schema.key_column_usage AS kc JOIN information_schema.table_constraints AS c ON kc.table_schema = c.table_schema AND kc.table_name = c.table_name AND kc.constraint_name = c.constraint_name WHERE kc.table_schema = 'public' AND kc.table_name = 'schema_uniquetest'; constraint_name | column_name | constraint_type | array ---------------------------------+-------------+-----------------+------------------------------------------------- schema_uniquetest_year_slug_key | slug | UNIQUE | {schema_uniquetest.year,schema_uniquetest.slug} schema_uniquetest_year_slug_key | year | UNIQUE | {schema_uniquetest.year,schema_uniquetest.slug} schema_uniquetest_pkey | id | PRIMARY KEY | {schema_uniquetest.id} (3 rows) test_django_default=# SELECT kc.constraint_name, kc.column_name, c.constraint_type, array(SELECT table_name::text || '.' || column_name::text FROM information_schema.constraint_column_usage WHERE constraint_name = kc.constraint_name) FROM information_schema.key_column_usage AS kc JOIN information_schema.table_constraints AS c ON kc.table_schema = c.table_schema AND kc.table_name = c.table_name AND kc.constraint_name = c.constraint_name WHERE kc.table_schema = 'public' AND kc.table_name = 'schema_uniquetest' COLLATE "C"; constraint_name | column_name | constraint_type | array ---------------------------------+-------------+-----------------+------------------------------------------------- schema_uniquetest_pkey | id | PRIMARY KEY | {schema_uniquetest.id} schema_uniquetest_year_slug_key | year | UNIQUE | {schema_uniquetest.year,schema_uniquetest.slug} schema_uniquetest_year_slug_key | slug | UNIQUE | {schema_uniquetest.year,schema_uniquetest.slug} (3 rows) test_django_default=# SELECT kc.constraint_name, kc.column_name, c.constraint_type, array(SELECT table_name::text || '.' || column_name::text FROM information_schema.constraint_column_usage WHERE constraint_name = kc.constraint_name) FROM information_schema.key_column_usage AS kc JOIN information_schema.table_constraints AS c ON kc.table_schema = c.table_schema AND kc.table_name = c.table_name AND kc.constraint_name = c.constraint_name WHERE kc.table_schema = 'public' AND kc.table_name = 'schema_uniquetest' COLLATE "nl_NL"; constraint_name | column_name | constraint_type | array ---------------------------------+-------------+-----------------+------------------------------------------------- schema_uniquetest_pkey | id | PRIMARY KEY | {schema_uniquetest.id} schema_uniquetest_year_slug_key | year | UNIQUE | {schema_uniquetest.year,schema_uniquetest.slug} schema_uniquetest_year_slug_key | slug | UNIQUE | {schema_uniquetest.year,schema_uniquetest.slug} (3 rows)
comment:13 by , 11 years ago
The query is missing ordering - the db is free to return the rows in any order it wishes. Collation changing result ordering is a bit surprising, but that doesn't mean PostgreSQL is doing anything wrong here.
comment:14 by , 11 years ago
I got the exact same response on #postgresql
[00:33:51] <urth> I'm getting an odd ordering problem with postgresql-9.3.0, unless I explicitly collate the query the rows are returned in reversed order http://pgsql.privatepaste.com/c3e1e066ea
[00:34:58] <RhodiumToad> urth: I see no ORDER BY there?
[00:35:28] <RhodiumToad> urth: without ORDER BY, the result is returned in whatever order suits the planner's whim, and will often change between queries
[00:39:08] <urth> RhodiumToad: the query is part of a testcase for django, if that test is run on it's own it succeeds, but with the full test suite it fails for some. can other queries on the database influence the ordering used by the planner?
[00:39:25] <RhodiumToad> yes
[00:39:50] <urth> ok, thanks
[00:40:47] <RhodiumToad> the rules are quite clear: without an order by at the top query level, the order of the result of any query or subquery is indeterminate
comment:15 by , 11 years ago
There should be a column in that table that we can use for ORDER BY to get the columns returned in the correct order. I'll take a look.
comment:17 by , 11 years ago
Component: | Documentation → Database layer (models, ORM) |
---|---|
Severity: | Normal → Release blocker |
Changing flags since we're heading towards code changes.
comment:18 by , 11 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | unset |
I rebased the pull request on top of Andrew's fixes and I confirmed it passes the test suite on SQLite, PostgreSQL and MySQL. I have no reason to suspect it no longer passes it on Oracle.
The code itself is RFC; this ticket still needs:
- a message on django-developers to explain the tradeoffs,
- additional explanations in the documentation.
comment:19 by , 11 years ago
Needs documentation: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
This looks good to me.
I wonder if there should be an example of the probably most common case of actually needing set_rollback(), that is try: obj.save() except IntegrityError: # do something else
on MySQL or Oracle for example. But this can be added later on. I have a feeling we are going to get some feedback on this anyways...
From the options available to us, marking the outer block for rollback and forbidding queries seems to be the best one (it is not great, but neither are any of the other options). So, marking as RFC.
For reference, my understanding is that here are the options:
- make atomic(savepoint=False) a no-op when inside transaction - leads to it being non-atomic.
- always use savepoints - heavy cost for some operations (mostly multiple obj.save() in a row)
- mark outer block for rollback, allow queries. This works great for read-only queries, but once you write in the marked for rollback block it seems things work OK, but your modifications are silently rolled back.
- mark outer block for rollback, forbid data modification queries - django has no way to know which queries modify data
- mark outer block for rollback, forbid all queries - forces users to explicitly continue the transaction if that is needed. Doing so might be dangerous. But it is their explicit decision to do so. It might be annoying for those who are accustomed to MySQL/Oracle way of continuing transaction after IntegrityError.
comment:20 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I added a warning to the docs earlier today, see [4db2752e] and [0ad178c4].
Anssi also suggested that Django should raise an error when attempting to make database queries while a broken transaction is in progress. This would improve cross-database consistency, normalizing to PostgreSQL's behavior.
Here's a patch implementing this idea: https://github.com/RichardOfWard/django/commit/cb46c75db275db59b54511c090286255bd9cc46d
Full discussion: https://groups.google.com/d/topic/django-developers/ACLQRF-71s8/discussion