Opened 2 years ago

Last modified 4 months ago

#19544 new Cleanup/optimization

IntegrityError during Many To Many add()

Reported by: Kronuz Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: Kronuz, jon.dufresne@… Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I'm frequently getting this exception ("IntegrityError: duplicate key value violates unique constraint...") a lot in my production server when I do add() on a Many To Many relationship. For what I've seen, there seems to be a race condition in django.db.models.fields.related's ManyRelatedManager._add_items() between the point new_ids is "calculated" and the point bulk_create() actually creates the tuples in the database.

Change History (14)

comment:1 Changed 2 years ago by Kronuz

  • Cc Kronuz added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 2 years ago by aaugustin

Which database are you using, and under which isolation level are you running?

Can you provide the full traceback?

Thanks.

comment:3 follow-up: Changed 2 years ago by Kronuz

This is traceback I'm getting:

Task core.tasks._no_retry_task_generic[802b4b61-be07-41ed-a7fa-f5f8c671b4d8] raised exception: IntegrityError('duplicate key value violates unique constraint "social_connect_socialcontact_entit_socialcontact_id_user_id_key"\nDETAIL:  Key (socialcontact_id, user_id)=(56151, 2146) already exists.\n',)

Stacktrace (most recent call last):

  File "celery/task/trace.py", line 212, in trace_task
    R = retval = fun(*args, **kwargs)
  File "core/tasks.py", line 70, in _no_retry_task_generic
    return _run_task(_no_retry_task_generic, *args, **kwargs)
  File "core/tasks.py", line 34, in _run_task
    return callback(*_args, **_kwargs)  # do not pass **kwargs (as it has "magic" stuff)??
  File "social_connect/Facebook/views.py", line 246, in _callback
    join_contacts(user, social_connect_contacts, site)
  File "relationships/utils.py", line 122, in join_contacts
    contact.entities.add(from_entity)
  File "django/db/models/fields/related.py", line 625, in add
    self._add_items(self.source_field_name, self.target_field_name, *objs)
  File "django/db/models/fields/related.py", line 710, in _add_items
    for obj_id in new_ids
  File "django/db/models/query.py", line 421, in bulk_create
    self.model._base_manager._insert(objs_without_pk, fields=[f for f in fields if not isinstance(f, AutoField)], using=self.db)
  File "django/db/models/manager.py", line 203, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "django/db/models/query.py", line 1581, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "django/db/models/sql/compiler.py", line 910, in execute_sql
    cursor.execute(sql, params)
  File "django/db/backends/util.py", line 41, in execute
    return self.cursor.execute(sql, params)
  File "django/db/backends/postgresql_psycopg2/base.py", line 52, in execute
    return self.cursor.execute(query, args)

As you can see, I'm using pgsql, and I'm receiving this traceback from a celery task.
The contact.entities.add(from_entity) is in a loop, which I'm not changing to pass an array to add all at once instead. Hopefully that could fix the problem, maybe some cache in the entities manager is not letting _add_items() to properly get the already used primary keys.

Version 0, edited 2 years ago by Kronuz (next)

comment:4 in reply to: ↑ 3 Changed 2 years ago by akaariai

  • Component changed from Uncategorized to Database layer (models, ORM)
  • Triage Stage changed from Unreviewed to Someday/Maybe
  • Type changed from Uncategorized to Cleanup/optimization

Looks like there is a race condition. The ways to fix this seem to be:

  • Locking the m2m table (or some portion of it). This will lead to deadlocks so it is a no-go.
  • Retrying until done: create a savepoint before bulk_create, if IntegrityError is risen rollback, select-bulk_create again. Here the cost is additional savepoint which can double the cost of .add().
  • Doing the insert using SQL like:
    INSERT INTO m2mtbl(lhs_id, rhs_id)
        (SELECT lhs_id, rhs_id FROM (values(1, 2), (1, 3), (1, 4) AS tmp(lhs_id, rhs_id)
           LEFT JOIN m2mtbl ON m2mtbl.lhs_id = tmp.lhs.id and m2mtbl.rhs_id = tmp.rhs_id WHERE m2mtbl.id IS NULL);
    

The last SQL is actually quite nice - we are guaranteed atomicity, and we need to execute just a single SQL statement instead of the two done currently. This is going to perform better unless the inserted set is huge and most of the values already exists. However the SQL used is *highly* DB backend specific so this isn't an easy addition to do.

I am going to mark this someday/maybe. Getting rid of the race condition would be nice, but I don't see an easy and well performing way to do that.

comment:5 Changed 11 months ago by Patrick Robertson <django@…>

Just my 2¢ - I just got this error on my production server, which is pretty nasty.

I'm using postgres

Any temporary solutions would be appreciated, as for me at least this is quite a serious bug

Traceback:
This is just from a simple form.save() call, and I haven't experienced it once in my testing.
I'd argue that this should be marked with a higher priority, regardless of how easy it is to implement... :)

 File "/somewhere/webapps/ordersystem/ordersystem/ordersystem/views.py", line 1188, in form_valid
   customer = form.save()

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/forms/models.py", line 446, in save
   construct=False)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/forms/models.py", line 100, in save_instance
   save_m2m()

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/forms/models.py", line 96, in save_m2m
   f.save_form_data(instance, cleaned_data[f.name])

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 1529, in save_form_data
   setattr(instance, self.attname, data)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/utils/functional.py", line 242, in __setattr__
   setattr(self._wrapped, name, value)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 842, in __set__
   manager.add(*value)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 583, in add
   self._add_items(self.source_field_name, self.target_field_name, *objs)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/fields/related.py", line 672, in _add_items
   for obj_id in new_ids

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/query.py", line 359, in bulk_create
   self._batched_insert(objs_without_pk, fields, batch_size)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/query.py", line 838, in _batched_insert
   using=self.db)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert
   return insert_query(self.model, objs, fields, **kwargs)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/query.py", line 1514, in insert_query
   return query.get_compiler(using=using).execute_sql(return_id)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql
   cursor.execute(sql, params)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
   return self.cursor.execute(sql, params)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
   six.reraise(dj_exc_type, dj_exc_value, traceback)

 File "/somewhere/.virtualenvs/ordersystem/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
   return self.cursor.execute(sql, params)

IntegrityError: duplicate key value violates unique constraint "ordersystem_customer_disliked_vege_customer_id_vegetable_id_key"
DETAIL:  Key (customer_id, vegetable_id)=(67, 37) already exists.

comment:6 Changed 11 months ago by Patrick Robertson <django@…>

Looking at the code (very briefly), I think _add_items in db/models/fields/relate.py (line 672 ish) might be what's causing the problem.

We're saving a list of ids new_ids which is later used for the bulk creation. As you can see, if this method were called twice at the same time for whatever reason (I think in my case it was probably a double submit of the form) then a race condition could occurr

comment:7 Changed 9 months ago by anonymous

  • Resolution set to fixed
  • Status changed from new to closed

comment:8 Changed 8 months ago by nihilifer

I see that it's set as fixed and closed. That's because this bug cannot be reproduced or it was really fixed?
Asking, because I have the same problem on Django 1.5.8.

comment:9 Changed 8 months ago by timgraham

  • Resolution fixed deleted
  • Status changed from closed to new

Reopening because the ticket was closed anonymously without any detail.

comment:10 Changed 7 months ago by jdufresne

I am running into the same issue using MySQL and Django 1.6.7. Looks like it may be an issue across multiple database backends. Here is the backtrace:

Traceback (most recent call last):
  File "manage.py", line 13, in <module>
    execute_from_command_line(sys.argv)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/core/management/base.py", line 415, in handle
    return self.handle_noargs(**options)
  File "/home/jon/devel/myproj/development/myapp/management/commands/thetest.py", line 36, in handle_noargs
    MyModel.objects.bulk_create(objs)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/manager.py", line 160, in bulk_create
    return self.get_queryset().bulk_create(*args, **kwargs)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 359, in bulk_create
    self._batched_insert(objs_without_pk, fields, batch_size)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 838, in _batched_insert
    using=self.db)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/manager.py", line 232, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/query.py", line 1514, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 903, in execute_sql
    cursor.execute(sql, params)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 69, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/django/db/backends/mysql/base.py", line 124, in execute
    return self.cursor.execute(query, args)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/MySQLdb/cursors.py", line 205, in execute
    self.errorhandler(self, exc, value)
  File "/home/jon/devel/myproj/development/venv/lib/python2.7/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
django.db.utils.IntegrityError: (1062, "Duplicate entry 'mytype-7-6' for key 'type'")

comment:11 Changed 7 months ago by jdufresne

I have investigated this a bit more. For me the race occurs for the following reason:

Suppose you have MyModel that has a unique constraint. Suppose you have a view of the form:

@transaction.atomic
def my_view(request):
   MyModel.objects.some_long_filter().delete()
   objs = builds_list_of_unsaved_models_from_request(request)
   MyModel.objects.bulk_create(objs)

That is, a view that builds a list of objects in bulk. The view must first delete the objects that were created by the previous request to ensure the unique constraint is not violated. From my investigation, the delete() will not actually call an SQL DELETE statement, as I originally expected, but instead caches a list of ids of objects that exist, then calls a new SQL query with a DELETE .. WHERE id IN (list of ids). This all happens in DeleteQuery.delete_qs() file django/db/models/sql/subqueries.py:52.

Now suppose you have some original data. Then two requests occur at the same time. The first request and the second request will both cache the same ids from the original data to delete. Suppose the first request finishes slightly earlier and commits the *new objects with new ids*. Now, the second request will try to delete the cached ids from the original data and *not* from the first request. Upon save there is a unique constraint violation -- as the wrong ids were deleted -- causing the IntegrityError.

Coming from an application with many raw SQL queries, this unique constraint violation was a big surprise with the use of transactions. The caching of ids in the delete() function seems a sneaky source of race conditions.

Now, having investigated this, I am not sure if my case is the same bug that was originally reported. If anyone thinks I should be filing this as a separate ticket, I can move it.

comment:12 Changed 7 months ago by akaariai

I don't think comment:11 is talking about this ticket's issue (that is, it is not about m2m add()).

comment:13 Changed 7 months ago by jdufresne

  • Cc jon.dufresne@… added

I don't think comment:11 is talking about this ticket's issue (that is, it is not about m2m add()).

I have moved my issue to ticket #23576.

comment:14 Changed 4 months ago by lechatpito

I think I have the same error using 1.6.7. Any idea how to fix this ?
Here is the trace:

Stacktrace (most recent call last):

  File "celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "celery/app/trace.py", line 421, in __protected_call__
    return self.run(*args, **kwargs)
  File "itemizer/tasks.py", line 125, in itemize
    write_entity_object_to_model(entity, ranking.top3_item, ranking.category, ranking)
  File "itemizer/tasks.py", line 453, in write_entity_object_to_model
    e.categories.add(category)
  File "django/db/models/fields/related.py", line 583, in add
    self._add_items(self.source_field_name, self.target_field_name, *objs)
  File "django/db/models/fields/related.py", line 672, in _add_items
    for obj_id in new_ids
  File "django/db/models/query.py", line 359, in bulk_create
    self._batched_insert(objs_without_pk, fields, batch_size)
  File "django/db/models/query.py", line 838, in _batched_insert
    using=self.db)
  File "django/db/models/manager.py", line 232, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "django/db/models/query.py", line 1514, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "django/db/models/sql/compiler.py", line 903, in execute_sql
    cursor.execute(sql, params)
  File "django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "django/db/backends/util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "django/db/backends/mysql/base.py", line 124, in execute
    return self.cursor.execute(query, args)
  File "MySQLdb/cursors.py", line 205, in execute
    self.errorhandler(self, exc, value)
  File "MySQLdb/connections.py", line 36, in defaulterrorhandler
    raise errorclass, errorvalue
Note: See TracTickets for help on using tickets.
Back to Top