Opened 12 years ago

Closed 9 years ago

#17946 closed Bug (fixed)

Deserialisation bug for ManyToManyField

Reported by: Philip Mountifield <pmountifield@…> Owned by: nobody
Component: Core (Serialization) Version: dev
Severity: Normal Keywords:
Cc: pmountifield@…, gerdemb Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Jannis Leidel)

A simple but obscure bug exists with the deserialisation of many to many fields from fixtures. This bug was first introduced in between version 0.96.5 and 1.0, however it takes a very specific set of circumstances to reproduce it so I guess it has gone unnoticed for some years. I have been testing against 1.3.1.

  1. Create a model with a ManyToManyField('self')
  2. Create and save an instance of the model with a link to itself via the many to many
  3. Dump the object (e.g. "python manage.py dumpdata --indent 2 my_app_name > test.json")
  4. Restore from the dump (e.g. "python loaddata test.json")

You will get a IntegrityError and trackback as follows:

  Problem installing fixture 'test.json': Traceback (most recent call last):
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/core/management/commands/loaddata.py", line 174, in handle
      obj.save(using=using)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/core/serializers/base.py", line 168, in save
      setattr(self.object, accessor_name, object_list)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/fields/related.py", line 749, in __set__
      manager.add(*value)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/fields/related.py", line 507, in add
      self._add_items(self.target_field_name, self.source_field_name, *objs)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/fields/related.py", line 590, in _add_items
      '%s_id' % target_field_name: obj_id,
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/query.py", line 360, in create
      obj.save(force_insert=True, using=self.db)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/base.py", line 460, in save
      self.save_base(using=using, force_insert=force_insert, force_update=force_update)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/base.py", line 553, in save_base
      result = manager._insert(values, return_id=update_pk, using=using)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/manager.py", line 195, in _insert
      return insert_query(self.model, values, **kwargs)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/query.py", line 1436, in insert_query
      return query.get_compiler(using=using).execute_sql(return_id)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/sql/compiler.py", line 791, in execute_sql
      cursor = super(SQLInsertCompiler, self).execute_sql(None)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/sql/compiler.py", line 735, in execute_sql
      cursor.execute(sql, params)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/backends/util.py", line 34, in execute
      return self.cursor.execute(sql, params)
    File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
      return self.cursor.execute(query, args)
  IntegrityError: duplicate key violates unique constraint "webpage_related_from_webpage_id_key"

The reason for the error is exactly what it says it is, there is already a related item with the same to and from ids in the related table, the question was why. A little digging uncovered the following:

  1. With many to many fields to self the symmetric link is automagically created in django/db/models/fields/related.py by the add function of ManyRelatedManager
  2. The _add_items of ManyRelatedManager checks to see what ids are already in the related table and excludes them from the set of new_ids, however in the case of loading fixtures the existing records are not properly excluded because the list of objs passed to _add_items is actually a list of unicode stings representing the ids instead of the actual integer values and obviously "set([u'1']) - set([1])" will still leave the u'1' in the set.
  3. So the issue lies with what is being passed to the related manager from loaddata/serialisation.
  4. The problem is only manifested with a item which is related to itself since the related items are cleared when you assign to the related attribute, in the case of a related item linking to itself it throws up the error because the operation is attempted again for the symmetrical link which violates the unique clause.
  5. Passing in the correct object ids as integers will fix the problem since the line "new_ids = new_ids - set(vals)" of related.py will correctly exclude the already created forward link which is the same as the reverse link when you have a many to many to the same model as it is from.
  6. The erroneous unicode ids originate from the deserialisation in django/core/serializers/python.py where the following two lines convert the pk values
  1. "return smart_unicode(field.rel.to._meta.pk.to_python(value))"
  2. "m2m_convert = lambda v: smart_unicode(field.rel.to._meta.pk.to_python(v))"
  1. If you look at the other calls to smart_unicode in the file you'll notice that all but one use a extra kwarg "strings_only=True" and this is missing from these two lines, hence the genuine integer ids are being converted to unicode here, and this is what breaks the many to many related item to the same instance.
  2. Even the calls in Serializer have the "strings_only=True" argument so the encoding/decoding is not even symmetrical!
  3. Changing the two lines of django/core/serializers/python.py to the following fixes the problem
  1. "return smart_unicode(field.rel.to._meta.pk.to_python(value), strings_only=True)"
  2. "m2m_convert = lambda v: smart_unicode(field.rel.to._meta.pk.to_python(v), strings_only=True)"
  1. This doesn't seem to breaking anything, since it is actually switching back to the correct behaviour and is only triggered by an edge case. Also the test suite passes fine.

Regards
Philip

Change History (7)

comment:1 by Philip Mountifield <pmountifield@…>, 12 years ago

Cc: pmountifield@… added

comment:2 by gerdemb, 12 years ago

Cc: gerdemb added

comment:3 by Jannis Leidel, 12 years ago

Description: modified (diff)

Updated formatting

comment:4 by Aymeric Augustin, 12 years ago

Triage Stage: UnreviewedAccepted

I haven't reproduced the bug myself, but the description is sufficiently detailed and accurate to convince me.

comment:5 by Claude Paroz, 9 years ago

Has patch: set
Version: 1.3master

comment:6 by Simon Charette, 9 years ago

Triage Stage: AcceptedReady for checkin

LGTM.

It's a shame it felt under the radar given the great report from Philip. Really an edge case I guess.

comment:7 by Claude Paroz <claude@…>, 9 years ago

Resolution: fixed
Status: newclosed

In 9fecb86a5221982ea753270b7d2b4c6487f79941:

Fixed #17946 -- Fixed deserialization of self-referencing M2M fields

Thanks Philip Mountifield for the report and excellent analysis, and
Simon Charette for the review.

Note: See TracTickets for help on using tickets.
Back to Top