#23576 closed New feature (fixed)
Fast-path deletion for MySQL
Reported by: | Jon Dufresne | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Originally posted to #19544 due to similar error message, but I this is a different ticket.
I have investigated this a bit more. For me the race occurs for the following reason:
Suppose you have MyModel
that has a unique constraint. Suppose you have a view of the form:
@transaction.atomic def my_view(request): MyModel.objects.some_long_filter().delete() objs = builds_list_of_unsaved_models_from_request(request) MyModel.objects.bulk_create(objs)
That is, a view that builds a list of objects in bulk. The view first deletes the objects that were created by any previous requests to ensure the unique constraint is not violated. From my investigation, the delete()
will not actually call an SQL DELETE
statement directly -- as I would have expected -- but instead caches a list of ids of objects that exist, then calls a new SQL query with a DELETE .. WHERE id IN (list of ids)
. This all happens in DeleteQuery.delete_qs()
file django/db/models/sql/subqueries.py:52
.
Now suppose you have some original data. Then, two requests occur at the same time. The first request and the second request will both cache the same ids from the original data to delete. Suppose the first request finishes slightly earlier and commits the *new objects with new ids*. Now, the second request will try to delete the cached ids from the original data and *not* the newly created ids the first request. Upon save there is a unique constraint violation -- as the wrong ids were deleted -- causing an IntegrityError
.
Coming from an application with many raw SQL queries, this unique constraint violation was a big surprise inside a transaction. The caching of ids in the delete()
function seems a sneaky source of race conditions.
Traceback when the IntegrityError occurs:
Traceback (most recent call last): File "manage.py", line 13, in <module> execute_from_command_line(sys.argv) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line utility.execute() File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv self.execute(*args, **options.__dict__) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute output = self.handle(*args, **options) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 415, in handle return self.handle_noargs(**options) File "/home/jon/devel/myproj/development/myapp/management/commands/thetest.py", line 36, in handle_noargs MyModel.objects.bulk_create(objs) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/manager.py", line 160, in bulk_create return self.get_queryset().bulk_create(*args, **kwargs) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 359, in bulk_create self._batched_insert(objs_without_pk, fields, batch_size) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 838, in _batched_insert using=self.db) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert return insert_query(self.model, objs, fields, **kwargs) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 1514, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql cursor.execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 69, in execute return super(CursorDebugWrapper, self).execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 124, in execute return self.cursor.execute(query, args) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute self.errorhandler(self, exc, value) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue django.db.utils.IntegrityError: (1062, "Duplicate entry 'mytype-7-6' for key 'type'")
Change History (11)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
This looks specific to database backends that do not support update_can_self_select
. From the list of database backends shipped with Django, this applies only to MySQL.
This is the race free query generated for backends supporting update_can_self_select
.
DELETE FROM `myapp_c` WHERE `myapp_c`.`id` IN (SELECT U0.`id` AS `id` FROM `myapp_c` U0 INNER JOIN `myapp_a` U1 ON ( U0.`a_id` = U1.`id` ) INNER JOIN `myapp_b` U2 ON ( U0.`b_id` = U2.`id` ) WHERE (U1.`name` = 'foo' AND U2.`name` = 'bar'))
The problematic queries used by the MySQL:
SELECT `myapp_c`.`id` FROM `myapp_c` INNER JOIN `myapp_a` ON ( `myapp_c`.`a_id` = `myapp_a`.`id` ) INNER JOIN `myapp_b` ON ( `myapp_c`.`b_id` = `myapp_b`.`id` ) WHERE (`myapp_a`.`name` = 'foo' AND `myapp_b`.`name` = 'bar') DELETE FROM `myapp_c` WHERE `myapp_c`.`id` IN ({ RESULT FROM PREVIOUS QUERY })
The fact that this occurs across two queries allows rows to be inserted that would normally meet the criteria of the WHERE
clause.
One way to handle this would be to use MySQL's multiple table syntax http://dev.mysql.com/doc/refman/5.7/en/delete.html which allows joins. The query would be as simple as taking the SELECT
query, removing the SELECT
clause and replacing it with DELETE myapp_c FROM myapp_c INNER JOIN ...
.
I am interested in doing this, however, I'm not sure exactly where to start in order to build a better query. I see inside DeleteQuery.delete_qs()
is where the check for feature update_can_self_select
occurs. So perhaps a new feature check and branch should occur here. However, once the feature is checked, I'm not sure how to start building the alternative query. Any guidance would be appreciated.
comment:3 by , 10 years ago
Summary: | ORM delete strategy can lead to race conditions inside a transaction → Fast-path deletion for MySQL |
---|---|
Triage Stage: | Unreviewed → Accepted |
The best advice I can give is to start from django/db/models/sql/subqueries.py and find update_can_self_select in there. You might be able to use a different compiler for the original query, one which does DELETE FROM <original query> instead of SELECT ... FROM <original query>.
Note that even if you fix this, there are still cases where Django has to fetch the IDs to memory, then delete only those IDs. The reason is that Django needs to fire pre/post delete signals and cascade the deletion to dependent models. Especially for pre-delete signal the model must still exists in the database, so I don't see any other way than pre-fetch ids, then delete those ids.
At least on PostgreSQL single query delete doesn't result in concurrency safe behavior. For example, it is possible that the view deletes the objects, another transaction commits an conflicting object, and then finally the view commits its objects -> unique constraint violation.
I am marking this as accepted for allowing fast-path deletion for MySQL.
comment:4 by , 10 years ago
Type: | Uncategorized → New feature |
---|
comment:5 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 5 years ago
Triage Stage: | Accepted → Ready for checkin |
---|---|
Version: | 1.7 → master |
I have created a minimal test case: https://github.com/jdufresne/djtest-23576 I'd be happy to answer any questions about the test case if anything requires explanation. The waiting on
stdin
is to simulate a view with a long running process inside a transaction.To run the test, run
$ python test.py
from the project root directory. (As it is a race condition it may take more than one try.)The test runs in a management command instead of a HTTP view in order to easily trigger the race using multiple processes, but the principle is the same. Imagine the command is the view and each process is an HTTP request. Upon running the test and triggering the error I get the following traceback: