Opened 12 years ago
Last modified 2 years ago
#19544 new Cleanup/optimization
IntegrityError during ManyToMany add() on Oracle or for user-defined through relationships.
Reported by: | German M. Bravo | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | German M. Bravo, jon.dufresne@…, Дилян Палаузов, Simon Charette, Florian Demmer | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I'm frequently getting this exception ("IntegrityError: duplicate key value violates unique constraint...") a lot in my production server when I do add()
on a Many To Many relationship. For what I've seen, there seems to be a race condition in django.db.models.fields.related
's ManyRelatedManager._add_items()
between the point new_ids
is "calculated" and the point bulk_create()
actually creates the tuples in the database.
Change History (31)
comment:1 by , 12 years ago
Cc: | added |
---|
comment:2 by , 12 years ago
follow-up: 4 comment:3 by , 12 years ago
This is traceback I'm getting:
Task core.tasks._no_retry_task_generic[802b4b61-be07-41ed-a7fa-f5f8c671b4d8] raised exception: IntegrityError('duplicate key value violates unique constraint "social_connect_socialcontact_entit_socialcontact_id_user_id_key"\nDETAIL: Key (socialcontact_id, user_id)=(56151, 2146) already exists.\n',) Stacktrace (most recent call last): File "celery/task/trace.py", line 212, in trace_task R = retval = fun(*args, **kwargs) File "core/tasks.py", line 70, in _no_retry_task_generic return _run_task(_no_retry_task_generic, *args, **kwargs) File "core/tasks.py", line 34, in _run_task return callback(*_args, **_kwargs) # do not pass **kwargs (as it has "magic" stuff)?? File "social_connect/Facebook/views.py", line 246, in _callback join_contacts(user, social_connect_contacts, site) File "relationships/utils.py", line 122, in join_contacts contact.entities.add(from_entity) File "django/db/models/fields/related.py", line 625, in add self._add_items(self.source_field_name, self.target_field_name, *objs) File "django/db/models/fields/related.py", line 710, in _add_items for obj_id in new_ids File "django/db/models/query.py", line 421, in bulk_create self.model._base_manager._insert(objs_without_pk, fields=[f for f in fields if not isinstance(f, AutoField)], using=self.db) File "django/db/models/manager.py", line 203, in _insert return insert_query(self.model, objs, fields, **kwargs) File "django/db/models/query.py", line 1581, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "django/db/models/sql/compiler.py", line 910, in execute_sql cursor.execute(sql, params) File "django/db/backends/util.py", line 41, in execute return self.cursor.execute(sql, params) File "django/db/backends/postgresql_psycopg2/base.py", line 52, in execute return self.cursor.execute(query, args)
As you can see, I'm using pgsql, and I'm receiving this traceback from a celery task.
The contact.entities.add(from_entity)
is in a loop, and entities is a ManyToManyField
(entities = models.ManyToManyField(User)
in SocialContact
):
for contact in SocialContact.objects.filter(id__in=contact_ids): contact.entities.add(from_entity)
comment:4 by , 12 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Triage Stage: | Unreviewed → Someday/Maybe |
Type: | Uncategorized → Cleanup/optimization |
Looks like there is a race condition. The ways to fix this seem to be:
- Locking the m2m table (or some portion of it). This will lead to deadlocks so it is a no-go.
- Retrying until done: create a savepoint before bulk_create, if IntegrityError is risen rollback, select-bulk_create again. Here the cost is additional savepoint which can double the cost of .add().
- Doing the insert using SQL like:
INSERT INTO m2mtbl(lhs_id, rhs_id) (SELECT lhs_id, rhs_id FROM (values(1, 2), (1, 3), (1, 4) AS tmp(lhs_id, rhs_id) LEFT JOIN m2mtbl ON m2mtbl.lhs_id = tmp.lhs.id and m2mtbl.rhs_id = tmp.rhs_id WHERE m2mtbl.id IS NULL);
The last SQL is actually quite nice - we are guaranteed atomicity, and we need to execute just a single SQL statement instead of the two done currently. This is going to perform better unless the inserted set is huge and most of the values already exists. However the SQL used is *highly* DB backend specific so this isn't an easy addition to do.
I am going to mark this someday/maybe. Getting rid of the race condition would be nice, but I don't see an easy and well performing way to do that.
comment:5 by , 11 years ago
Just my 2¢ - I just got this error on my production server, which is pretty nasty.
I'm using postgres
Any temporary solutions would be appreciated, as for me at least this is quite a serious bug
Traceback:
This is just from a simple form.save()
call, and I haven't experienced it once in my testing.
I'd argue that this should be marked with a higher priority, regardless of how easy it is to implement... :)
File "/somewhere/webapps/ordersystem/ordersystem/ordersystem/views.py", line 1188, in form_valid customer = form.save() File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/forms/models.py", line 446, in save construct=False) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/forms/models.py", line 100, in save_instance save_m2m() File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/forms/models.py", line 96, in save_m2m f.save_form_data(instance, cleaned_data[f.name]) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 1529, in save_form_data setattr(instance, self.attname, data) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/utils/functional.py", line 242, in __setattr__ setattr(self._wrapped, name, value) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 842, in __set__ manager.add(*value) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 583, in add self._add_items(self.source_field_name, self.target_field_name, *objs) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 672, in _add_items for obj_id in new_ids File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/query.py", line 359, in bulk_create self._batched_insert(objs_without_pk, fields, batch_size) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/query.py", line 838, in _batched_insert using=self.db) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert return insert_query(self.model, objs, fields, **kwargs) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/query.py", line 1514, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql cursor.execute(sql, params) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) IntegrityError: duplicate key value violates unique constraint "ordersystem_customer_disliked_vege_customer_id_vegetable_id_key" DETAIL: Key (customer_id, vegetable_id)=(67, 37) already exists.
comment:6 by , 11 years ago
Looking at the code (very briefly), I think _add_items in db/models/fields/relate.py (line 672 ish) might be what's causing the problem.
We're saving a list of ids new_ids
which is later used for the bulk creation. As you can see, if this method were called twice at the same time for whatever reason (I think in my case it was probably a double submit of the form) then a race condition could occurr
comment:7 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:8 by , 10 years ago
I see that it's set as fixed and closed. That's because this bug cannot be reproduced or it was really fixed?
Asking, because I have the same problem on Django 1.5.8.
comment:9 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
Reopening because the ticket was closed anonymously without any detail.
comment:10 by , 10 years ago
I am running into the same issue using MySQL and Django 1.6.7. Looks like it may be an issue across multiple database backends. Here is the backtrace:
Traceback (most recent call last): File "manage.py", line 13, in <module> execute_from_command_line(sys.argv) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line utility.execute() File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv self.execute(*args, **options.__dict__) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute output = self.handle(*args, **options) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 415, in handle return self.handle_noargs(**options) File "/home/jon/devel/myproj/development/myapp/management/commands/thetest.py", line 36, in handle_noargs MyModel.objects.bulk_create(objs) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/manager.py", line 160, in bulk_create return self.get_queryset().bulk_create(*args, **kwargs) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 359, in bulk_create self._batched_insert(objs_without_pk, fields, batch_size) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 838, in _batched_insert using=self.db) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert return insert_query(self.model, objs, fields, **kwargs) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 1514, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql cursor.execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 69, in execute return super(CursorDebugWrapper, self).execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 124, in execute return self.cursor.execute(query, args) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute self.errorhandler(self, exc, value) File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue django.db.utils.IntegrityError: (1062, "Duplicate entry 'mytype-7-6' for key 'type'")
comment:11 by , 10 years ago
I have investigated this a bit more. For me the race occurs for the following reason:
Suppose you have MyModel
that has a unique constraint. Suppose you have a view of the form:
@transaction.atomic def my_view(request): MyModel.objects.some_long_filter().delete() objs = builds_list_of_unsaved_models_from_request(request) MyModel.objects.bulk_create(objs)
That is, a view that builds a list of objects in bulk. The view must first delete the objects that were created by the previous request to ensure the unique constraint is not violated. From my investigation, the delete()
will not actually call an SQL DELETE
statement, as I originally expected, but instead caches a list of ids of objects that exist, then calls a new SQL query with a DELETE .. WHERE id IN (list of ids)
. This all happens in DeleteQuery.delete_qs()
file django/db/models/sql/subqueries.py:52
.
Now suppose you have some original data. Then two requests occur at the same time. The first request and the second request will both cache the same ids from the original data to delete. Suppose the first request finishes slightly earlier and commits the *new objects with new ids*. Now, the second request will try to delete the cached ids from the original data and *not* from the first request. Upon save there is a unique constraint violation -- as the wrong ids were deleted -- causing the IntegrityError
.
Coming from an application with many raw SQL queries, this unique constraint violation was a big surprise with the use of transactions. The caching of ids in the delete()
function seems a sneaky source of race conditions.
Now, having investigated this, I am not sure if my case is the same bug that was originally reported. If anyone thinks I should be filing this as a separate ticket, I can move it.
comment:12 by , 10 years ago
I don't think comment:11 is talking about this ticket's issue (that is, it is not about m2m add()).
comment:13 by , 10 years ago
Cc: | added |
---|
I don't think comment:11 is talking about this ticket's issue (that is, it is not about m2m add()).
I have moved my issue to ticket #23576.
comment:14 by , 10 years ago
I think I have the same error using 1.6.7. Any idea how to fix this ?
Here is the trace:
Stacktrace (most recent call last): File "celery/app/trace.py", line 240, in trace_task R = retval = fun(*args, **kwargs) File "celery/app/trace.py", line 421, in __protected_call__ return self.run(*args, **kwargs) File "itemizer/tasks.py", line 125, in itemize write_entity_object_to_model(entity, ranking.top3_item, ranking.category, ranking) File "itemizer/tasks.py", line 453, in write_entity_object_to_model e.categories.add(category) File "django/db/models/fields/related.py", line 583, in add self._add_items(self.source_field_name, self.target_field_name, *objs) File "django/db/models/fields/related.py", line 672, in _add_items for obj_id in new_ids File "django/db/models/query.py", line 359, in bulk_create self._batched_insert(objs_without_pk, fields, batch_size) File "django/db/models/query.py", line 838, in _batched_insert using=self.db) File "django/db/models/manager.py", line 232, in _insert return insert_query(self.model, objs, fields, **kwargs) File "django/db/models/query.py", line 1514, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "django/db/models/sql/compiler.py", line 903, in execute_sql cursor.execute(sql, params) File "django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) File "django/db/utils.py", line 99, in __exit__ six.reraise(dj_exc_type, dj_exc_value, traceback) File "django/db/backends/util.py", line 53, in execute return self.cursor.execute(sql, params) File "django/db/backends/mysql/base.py", line 124, in execute return self.cursor.execute(query, args) File "MySQLdb/cursors.py", line 205, in execute self.errorhandler(self, exc, value) File "MySQLdb/connections.py", line 36, in defaulterrorhandler raise errorclass, errorvalue
comment:15 by , 9 years ago
This StackO question tries to find a work around: http://stackoverflow.com/questions/37654072/django-integrityerror-during-many-to-many-add/37968648
AFAIK the same issue existed for get_or_create() until some last version of django. See https://github.com/django/django/commit/1b331f6c1e
This work around might be a good starting point for a fix:
article.publications.through.objects.get_or_create( article=article, publication=pub)
comment:16 by , 8 years ago
The error is now in ManyRelatedManager._add_items. We try to get a list of existing ids in vals
, then only create ones that don't exist already. This of course causes a race when two threads do that at the same time.
I think the only real solution is ON CONFLICT IGNORE.
comment:17 by , 7 years ago
Cc: | added |
---|
comment:18 by , 7 years ago
comment:19 by , 7 years ago
Cc: | added |
---|
comment:20 by , 6 years ago
Cc: | added |
---|---|
Triage Stage: | Someday/Maybe → Accepted |
Now that bulk_create(ignore_conflicts=True)
has landed in #28668 it should be possible to rely on it to address this issue for auto-created many-to-many relationships.
AFAIK it's only possible to do for such relationships because we know the exact schema of these tables and that a conflict can only happen for (source_field_name, target_field_name)
tuples. This is not the case for manually defined through
which could have other unique_together
defined and are allowed to be .add(...)
'ed under certain circumstances since #9475 was merged.
Now that master dropped support for PostgreSQL 9.4 the issue should only remain on Oracle which doesn't support this feature.
I guess the code in _add_items
could also be adapted to use get_or_create
when supports_ignore_conflicts=False
or to deal with through
models if we had some form of IntegrityError
introspection in place to determine which constraint was violated. I think we should keep this ticket focused on dealing with the most common case that can be addressed with ignore_conflicts
though.
comment:21 by , 6 years ago
Has patch: | set |
---|
PR relying on ignore_conflicts
for auto-created intermediary models. The behaviour of IntegrityError
surfacing for user-defined through
model conflicts happens to be tested by m2m_through.tests.M2mThroughTests
.
comment:22 by , 6 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
The patch should be ready for review. It also implements a fast-add analogous to the the fast-delete algorithm when the relation and the backend allows it (e.g. when no m2m_changed
receivers are connected).
comment:26 by , 6 years ago
Has patch: | unset |
---|
Simon, should the ticket be closed or is there additional work that could be done?
comment:27 by , 6 years ago
Summary: | IntegrityError during Many To Many add() → IntegrityError during ManyToMany add() on Oracle or for user-defined through relationships. |
---|---|
Version: | 1.4 → master |
Tim, the issue still remains on Oracle since it doesn't support bulk_create(ignore_conflicts=True)
.
Also, now that add(through_defaults)
support was added the optimization cannot be used for these cases for the reasons detailed in comment:20. I'll add to this comment that get_or_create
cannot be used to work around the issue unless it's adapted not to send pre_save
/post_save
signal somehow.
To summarize the issue is still present on Oracle or on all backends when using add
on a relationship defined through a user defined ManyToMany(through)
. I think we should favor renaming this ticket instead of opening a new one since it contains a great discussion about the reasoning behind the changes.
I went ahead and did just that but feel free to revert my change or adjust the title to be more appropriate. Thanks!
comment:29 by , 5 years ago
Cc: | removed |
---|
comment:30 by , 3 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:31 by , 2 years ago
Cc: | added |
---|
Which database are you using, and under which isolation level are you running?
Can you provide the full traceback?
Thanks.