Deserialisation bug for ManyToManyField
|Reported by:||Philip Mountifield <pmountifield@…>||Owned by:||nobody|
|Cc:||pmountifield@…, gerdemb||Triage Stage:||Ready for checkin|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Description (last modified by jezdez)
A simple but obscure bug exists with the deserialisation of many to many fields from fixtures. This bug was first introduced in between version 0.96.5 and 1.0, however it takes a very specific set of circumstances to reproduce it so I guess it has gone unnoticed for some years. I have been testing against 1.3.1.
- Create a model with a ManyToManyField('self')
- Create and save an instance of the model with a link to itself via the many to many
- Dump the object (e.g. "python manage.py dumpdata --indent 2 my_app_name > test.json")
- Restore from the dump (e.g. "python loaddata test.json")
You will get a IntegrityError and trackback as follows:
Problem installing fixture 'test.json': Traceback (most recent call last): File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/core/management/commands/loaddata.py", line 174, in handle obj.save(using=using) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/core/serializers/base.py", line 168, in save setattr(self.object, accessor_name, object_list) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/fields/related.py", line 749, in __set__ manager.add(*value) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/fields/related.py", line 507, in add self._add_items(self.target_field_name, self.source_field_name, *objs) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/fields/related.py", line 590, in _add_items '%s_id' % target_field_name: obj_id, File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/query.py", line 360, in create obj.save(force_insert=True, using=self.db) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/base.py", line 460, in save self.save_base(using=using, force_insert=force_insert, force_update=force_update) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/base.py", line 553, in save_base result = manager._insert(values, return_id=update_pk, using=using) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/manager.py", line 195, in _insert return insert_query(self.model, values, **kwargs) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/query.py", line 1436, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/sql/compiler.py", line 791, in execute_sql cursor = super(SQLInsertCompiler, self).execute_sql(None) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/models/sql/compiler.py", line 735, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/backends/util.py", line 34, in execute return self.cursor.execute(sql, params) File "/usr/local/lib/python2.6/site-packages/Django-1.3.1-py2.6.egg/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute return self.cursor.execute(query, args) IntegrityError: duplicate key violates unique constraint "webpage_related_from_webpage_id_key"
The reason for the error is exactly what it says it is, there is already a related item with the same to and from ids in the related table, the question was why. A little digging uncovered the following:
- With many to many fields to self the symmetric link is automagically created in django/db/models/fields/related.py by the add function of ManyRelatedManager
- The _add_items of ManyRelatedManager checks to see what ids are already in the related table and excludes them from the set of new_ids, however in the case of loading fixtures the existing records are not properly excluded because the list of objs passed to _add_items is actually a list of unicode stings representing the ids instead of the actual integer values and obviously "set([u'1']) - set()" will still leave the u'1' in the set.
- So the issue lies with what is being passed to the related manager from loaddata/serialisation.
- The problem is only manifested with a item which is related to itself since the related items are cleared when you assign to the related attribute, in the case of a related item linking to itself it throws up the error because the operation is attempted again for the symmetrical link which violates the unique clause.
- Passing in the correct object ids as integers will fix the problem since the line "new_ids = new_ids - set(vals)" of related.py will correctly exclude the already created forward link which is the same as the reverse link when you have a many to many to the same model as it is from.
- The erroneous unicode ids originate from the deserialisation in django/core/serializers/python.py where the following two lines convert the pk values
- "return smart_unicode(field.rel.to._meta.pk.to_python(value))"
- "m2m_convert = lambda v: smart_unicode(field.rel.to._meta.pk.to_python(v))"
- If you look at the other calls to smart_unicode in the file you'll notice that all but one use a extra kwarg "strings_only=True" and this is missing from these two lines, hence the genuine integer ids are being converted to unicode here, and this is what breaks the many to many related item to the same instance.
- Even the calls in Serializer have the "strings_only=True" argument so the encoding/decoding is not even symmetrical!
- Changing the two lines of django/core/serializers/python.py to the following fixes the problem
- "return smart_unicode(field.rel.to._meta.pk.to_python(value), strings_only=True)"
- "m2m_convert = lambda v: smart_unicode(field.rel.to._meta.pk.to_python(v), strings_only=True)"
- This doesn't seem to breaking anything, since it is actually switching back to the correct behaviour and is only triggered by an edge case. Also the test suite passes fine.
Change History (7)
comment:1 Changed 4 years ago by Philip Mountifield <pmountifield@…>
- Cc pmountifield@… added
- Needs documentation unset
- Needs tests unset
- Patch needs improvement unset
comment:6 Changed 14 months ago by charettes
- Triage Stage changed from Accepted to Ready for checkin