Opened 8 years ago

Closed 8 years ago

#3390 closed (fixed)

Deserializer cannot handle circular and forward references to object instances

Reported by: russellm Owned by: jacob
Component: Core (Serialization) Version: master
Severity: Keywords: JSON deferrable deserialization circular forward reference
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

The current deserializers are not able to represent circular references between objects, or forward references to objects. This is because the reconstruction of m2o and m2m relations calls on the database API to resolve the ID's mentioned in the serialization string into object instances which can be saved. Therefore, if an object has not been saved, it cannot be successfully referenced in a serialization file. This is a serious impediment to the use of serialization files for test fixtures and database migration (see ticket #2333).

To overcome this, I submit this modification to the deserializer that bypasses the database lookup step. The revised deserializer saves the primary key values defined in the serialization file directly into the object. This is the equivalent of doing:

    Article.author_id = 3

instead of:

    Article.author = Author.objects.get(pk=3)

As well as being more flexible, it has the advantage of being faster (in that it requires less database lookups).

In order to work for m2m relations, this patch requires that patch #3389 be applied (this patch allows assignment of m2m sets using pk values).

This relatively simple approach works fine for all database backends except one - Postgres. Because Postgres has and enforces data consistency, it is not possible to INSERT an object with a foreign key value that does not have a corresponding entry in the foreign table.

To work around this problem, I have modified table declarations to make all cross-table references DEFERRABLE INITIALLY DEFERRED. This has the effect that the row constraints and consistency checks are not executed until the end of each transaction, rather than after every insert (although if transactions are not in use, there is no difference in behaviour).

In this way, if the deserialization process is placed inside a transaction, it is possible to deserialize an arbitrarily complex graph of foreign key relationships, and consistency is only checked once the entire graph has been deserialized.

On SQLite/MySQL, which don't have such strong consistency checks, there is no difference in behaviour or table definition.

The implementation presented here is for JSON only; if the idea is blessed, I will work up an analogous version for the XML deserializer.

Attachments (1)

serializer.diff (9.0 KB) - added by russellm 8 years ago.
Fix for circular/forward deserializer references

Download all attachments as: .zip

Change History (7)

Changed 8 years ago by russellm

Fix for circular/forward deserializer references

comment:1 Changed 8 years ago by Michael Radziej <mir@…>

the tests fail in a strange way here with the postgresql backend. It looks a bit weird, perhaps better check for yourself before you invest too many thoughts.

======================================================================
FAIL: Doctest: modeltests.serializers.models.__test__.API_TESTS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mir/lib/python/django/test/doctest.py", line 2156, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for modeltests.serializers.models.__test__.API_TESTS
  File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line unknown line number, in API_TESTS

----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    for obj in serializers.deserialize("json", json):
        obj.save()
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[31]>", line 2, in ?
        obj.save()
      File "/home/mir/lib/python/django/core/serializers/base.py", line 165, in save
        setattr(self.object, accessor_name, object_list)
      File "/home/mir/lib/python/django/db/models/fields/related.py", line 460, in __set__
        manager.add(*value)
      File "/home/mir/lib/python/django/db/models/fields/related.py", line 285, in add
        self._add_items(self.source_col_name, self.target_col_name, *objs)
      File "/home/mir/lib/python/django/db/models/fields/related.py", line 327, in _add_items
        raise ValueError, "objects to add() must be %s instances" % self.model._meta.object_name
    ValueError: objects to add() must be Category instances
----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    for obj in serializers.deserialize("json", json):
        obj.save()
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[37]>", line 2, in ?
        obj.save()
      File "/home/mir/lib/python/django/core/serializers/base.py", line 165, in save
        setattr(self.object, accessor_name, object_list)
      File "/home/mir/lib/python/django/db/models/fields/related.py", line 460, in __set__
        manager.add(*value)
      File "/home/mir/lib/python/django/db/models/fields/related.py", line 285, in add
        self._add_items(self.source_col_name, self.target_col_name, *objs)
      File "/home/mir/lib/python/django/db/models/fields/related.py", line 327, in _add_items
        raise ValueError, "objects to add() must be %s instances" % self.model._meta.object_name
    ValueError: objects to add() must be Category instances
----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    transaction.commit()
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[38]>", line 1, in ?
        transaction.commit()
      File "/home/mir/lib/python/django/db/transaction.py", line 153, in commit
        connection._commit()
      File "/home/mir/lib/python/django/db/backends/postgresql/base.py", line 86, in _commit
        return self.connection.commit()
    DatabaseError: {<cursor object at 0xb721b390>: 'FEHLER:  Einf\xc3\xbcgen oder Aktualisieren in Tabelle \xc2\xbbserializers_article\xc2\xab verletzt Fremdschl\xc3\xbcssel-Constraint \xc2\xbbauthor_id_refs_id_8fd3731\xc2\xab\nDETAIL:  Schl\xc3\xbcssel (author_id)=(4) ist nicht in Tabelle \xc2\xbbserializers_author\xc2\xab vorhanden.\n'}
----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    transaction.leave_transaction_management()
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[39]>", line 1, in ?
        transaction.leave_transaction_management()
      File "/home/mir/lib/python/django/db/transaction.py", line 70, in leave_transaction_management
        raise TransactionManagementError("Transaction managed block ended with pending COMMIT/ROLLBACK")
    TransactionManagementError: Transaction managed block ended with pending COMMIT/ROLLBACK
----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    article = Article.objects.get(pk=3)
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[40]>", line 1, in ?
        article = Article.objects.get(pk=3)
      File "/home/mir/lib/python/django/db/models/manager.py", line 73, in get
        return self.get_query_set().get(*args, **kwargs)
      File "/home/mir/lib/python/django/db/models/query.py", line 226, in get
        raise self.model.DoesNotExist, "%s matching query does not exist." % self.model._meta.object_name
    DoesNotExist: Article matching query does not exist.
----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    article
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[41]>", line 1, in ?
        article
    NameError: name 'article' is not defined
----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    article.categories.all()
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[42]>", line 1, in ?
        article.categories.all()
    NameError: name 'article' is not defined
----------------------------------------------------------------------
File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS
Failed example:
    article.author
Exception raised:
    Traceback (most recent call last):
      File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run
        compileflags, 1) in test.globs
      File "<doctest modeltests.serializers.models.__test__.API_TESTS[43]>", line 1, in ?
        article.author
    NameError: name 'article' is not defined

comment:2 Changed 8 years ago by russellm

The first two errors you received suggest that you don't have an up to date checkout - #3290 requires the patch from #3289, which was applied in [4438]. Its difficult to say for certain, but I think the remaining errors are all cascading failures from those first two.

comment:3 Changed 8 years ago by Michael Radziej <mir@…>

Sorry. I made a fresh checkout, and now the tests succeed (with postgresql). I was completely unaware that the checkout was so old ...

comment:4 follow-up: Changed 8 years ago by mtredinnick

I don't like this solution. Until now, the serializer provided a nice way to move data between Django installations: I could email somebody the serialised output and they could run it as an addition to their database. Or I could save it away and use it later after arbitrary things have happened in my database. But you can't do that if you also want to control the primary key values, because it may clash with existing ones.

I think we need to be smarter here, if we can.

comment:5 in reply to: ↑ 4 Changed 8 years ago by russellm

Replying to mtredinnick:

I don't like this solution. Until now, the serializer provided a nice way to move data between Django installations: I could email somebody the serialised output and they could run it as an addition to their database. Or I could save it away and use it later after arbitrary things have happened in my database. But you can't do that if you also want to control the primary key values, because it may clash with existing ones.

I'm not sure the existing serializer works in the way you suggest. In the existing implementation (of JSON, at least):

  • A serialization file contains the primary key for each object serialized; when each object is saved, it will overwrite any object in the database with that primary key.
  • If a serialized object contains a reference (FK or m2m) to an object, the deserializer looks in the database for an existing object with that primary key. If an object with that primary key already exists in the database, or has previously been deserialized, the reference is set; if it doesn't, an integrity error is thrown (hence this bug report).

Neither of these properties allow you to arbitrarily add blocks of data to the database. Primary keys have always been controlled by the serializer, and existing database objects have always been overwritten by deserialized ones.

I suppose you could add an 'overwrite' flag to the deserializer to differentiate between 'create a new object' and 'overwrite existing objects' when an object is deserialized - but then it would be difficult to keep references consistent. When my serialized object references PK=3, does it mean the PK=3 object in my file, or the PK=3 object in the database into which it is being loaded?

comment:6 Changed 8 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [4610]) Fixed #3390: the serializer can now contain forward references. Thanks, Russ.

Note: See TracTickets for help on using tickets.
Back to Top