Code

Opened 16 months ago

Last modified 16 months ago

#19544 new Cleanup/optimization

IntegrityError during Many To Many add()

Reported by: Kronuz Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: Kronuz Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I'm frequently getting this exception ("IntegrityError: duplicate key value violates unique constraint...") a lot in my production server when I do add() on a Many To Many relationship. For what I've seen, there seems to be a race condition in django.db.models.fields.related's ManyRelatedManager._add_items() between the point new_ids is "calculated" and the point bulk_create() actually creates the tuples in the database.

Attachments (0)

Change History (4)

comment:1 Changed 16 months ago by Kronuz

  • Cc Kronuz added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 16 months ago by aaugustin

Which database are you using, and under which isolation level are you running?

Can you provide the full traceback?

Thanks.

comment:3 follow-up: Changed 16 months ago by Kronuz

This is traceback I'm getting:

Task core.tasks._no_retry_task_generic[802b4b61-be07-41ed-a7fa-f5f8c671b4d8] raised exception: IntegrityError('duplicate key value violates unique constraint "social_connect_socialcontact_entit_socialcontact_id_user_id_key"\nDETAIL:  Key (socialcontact_id, user_id)=(56151, 2146) already exists.\n',)

Stacktrace (most recent call last):

  File "celery/task/trace.py", line 212, in trace_task
    R = retval = fun(*args, **kwargs)
  File "core/tasks.py", line 70, in _no_retry_task_generic
    return _run_task(_no_retry_task_generic, *args, **kwargs)
  File "core/tasks.py", line 34, in _run_task
    return callback(*_args, **_kwargs)  # do not pass **kwargs (as it has "magic" stuff)??
  File "social_connect/Facebook/views.py", line 246, in _callback
    join_contacts(user, social_connect_contacts, site)
  File "relationships/utils.py", line 122, in join_contacts
    contact.entities.add(from_entity)
  File "django/db/models/fields/related.py", line 625, in add
    self._add_items(self.source_field_name, self.target_field_name, *objs)
  File "django/db/models/fields/related.py", line 710, in _add_items
    for obj_id in new_ids
  File "django/db/models/query.py", line 421, in bulk_create
    self.model._base_manager._insert(objs_without_pk, fields=[f for f in fields if not isinstance(f, AutoField)], using=self.db)
  File "django/db/models/manager.py", line 203, in _insert
    return insert_query(self.model, objs, fields, **kwargs)
  File "django/db/models/query.py", line 1581, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)
  File "django/db/models/sql/compiler.py", line 910, in execute_sql
    cursor.execute(sql, params)
  File "django/db/backends/util.py", line 41, in execute
    return self.cursor.execute(sql, params)
  File "django/db/backends/postgresql_psycopg2/base.py", line 52, in execute
    return self.cursor.execute(query, args)

As you can see, I'm using pgsql, and I'm receiving this traceback from a celery task.

The contact.entities.add(from_entity) is in a loop, and entities is a ManyToManyField (entities = models.ManyToManyField(User) in SocialContact):

    for contact in SocialContact.objects.filter(id__in=contact_ids):
        contact.entities.add(from_entity)
Last edited 16 months ago by Kronuz (previous) (diff)

comment:4 in reply to: ↑ 3 Changed 16 months ago by akaariai

  • Component changed from Uncategorized to Database layer (models, ORM)
  • Triage Stage changed from Unreviewed to Someday/Maybe
  • Type changed from Uncategorized to Cleanup/optimization

Looks like there is a race condition. The ways to fix this seem to be:

  • Locking the m2m table (or some portion of it). This will lead to deadlocks so it is a no-go.
  • Retrying until done: create a savepoint before bulk_create, if IntegrityError is risen rollback, select-bulk_create again. Here the cost is additional savepoint which can double the cost of .add().
  • Doing the insert using SQL like:
    INSERT INTO m2mtbl(lhs_id, rhs_id)
        (SELECT lhs_id, rhs_id FROM (values(1, 2), (1, 3), (1, 4) AS tmp(lhs_id, rhs_id)
           LEFT JOIN m2mtbl ON m2mtbl.lhs_id = tmp.lhs.id and m2mtbl.rhs_id = tmp.rhs_id WHERE m2mtbl.id IS NULL);
    

The last SQL is actually quite nice - we are guaranteed atomicity, and we need to execute just a single SQL statement instead of the two done currently. This is going to perform better unless the inserted set is huge and most of the values already exists. However the SQL used is *highly* DB backend specific so this isn't an easy addition to do.

I am going to mark this someday/maybe. Getting rid of the race condition would be nice, but I don't see an easy and well performing way to do that.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.