Opened 18 years ago
Closed 18 years ago
#3390 closed (fixed)
Deserializer cannot handle circular and forward references to object instances
Reported by: | Russell Keith-Magee | Owned by: | Jacob |
---|---|---|---|
Component: | Core (Serialization) | Version: | dev |
Severity: | Keywords: | JSON deferrable deserialization circular forward reference | |
Cc: | Triage Stage: | Design decision needed | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
The current deserializers are not able to represent circular references between objects, or forward references to objects. This is because the reconstruction of m2o and m2m relations calls on the database API to resolve the ID's mentioned in the serialization string into object instances which can be saved. Therefore, if an object has not been saved, it cannot be successfully referenced in a serialization file. This is a serious impediment to the use of serialization files for test fixtures and database migration (see ticket #2333).
To overcome this, I submit this modification to the deserializer that bypasses the database lookup step. The revised deserializer saves the primary key values defined in the serialization file directly into the object. This is the equivalent of doing:
Article.author_id = 3
instead of:
Article.author = Author.objects.get(pk=3)
As well as being more flexible, it has the advantage of being faster (in that it requires less database lookups).
In order to work for m2m relations, this patch requires that patch #3389 be applied (this patch allows assignment of m2m sets using pk values).
This relatively simple approach works fine for all database backends except one - Postgres. Because Postgres has and enforces data consistency, it is not possible to INSERT an object with a foreign key value that does not have a corresponding entry in the foreign table.
To work around this problem, I have modified table declarations to make all cross-table references DEFERRABLE INITIALLY DEFERRED. This has the effect that the row constraints and consistency checks are not executed until the end of each transaction, rather than after every insert (although if transactions are not in use, there is no difference in behaviour).
In this way, if the deserialization process is placed inside a transaction, it is possible to deserialize an arbitrarily complex graph of foreign key relationships, and consistency is only checked once the entire graph has been deserialized.
On SQLite/MySQL, which don't have such strong consistency checks, there is no difference in behaviour or table definition.
The implementation presented here is for JSON only; if the idea is blessed, I will work up an analogous version for the XML deserializer.
Attachments (1)
Change History (7)
by , 18 years ago
Attachment: | serializer.diff added |
---|
comment:1 by , 18 years ago
the tests fail in a strange way here with the postgresql backend. It looks a bit weird, perhaps better check for yourself before you invest too many thoughts.
====================================================================== FAIL: Doctest: modeltests.serializers.models.__test__.API_TESTS ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 2156, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for modeltests.serializers.models.__test__.API_TESTS File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line unknown line number, in API_TESTS ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: for obj in serializers.deserialize("json", json): obj.save() Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[31]>", line 2, in ? obj.save() File "/home/mir/lib/python/django/core/serializers/base.py", line 165, in save setattr(self.object, accessor_name, object_list) File "/home/mir/lib/python/django/db/models/fields/related.py", line 460, in __set__ manager.add(*value) File "/home/mir/lib/python/django/db/models/fields/related.py", line 285, in add self._add_items(self.source_col_name, self.target_col_name, *objs) File "/home/mir/lib/python/django/db/models/fields/related.py", line 327, in _add_items raise ValueError, "objects to add() must be %s instances" % self.model._meta.object_name ValueError: objects to add() must be Category instances ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: for obj in serializers.deserialize("json", json): obj.save() Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[37]>", line 2, in ? obj.save() File "/home/mir/lib/python/django/core/serializers/base.py", line 165, in save setattr(self.object, accessor_name, object_list) File "/home/mir/lib/python/django/db/models/fields/related.py", line 460, in __set__ manager.add(*value) File "/home/mir/lib/python/django/db/models/fields/related.py", line 285, in add self._add_items(self.source_col_name, self.target_col_name, *objs) File "/home/mir/lib/python/django/db/models/fields/related.py", line 327, in _add_items raise ValueError, "objects to add() must be %s instances" % self.model._meta.object_name ValueError: objects to add() must be Category instances ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: transaction.commit() Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[38]>", line 1, in ? transaction.commit() File "/home/mir/lib/python/django/db/transaction.py", line 153, in commit connection._commit() File "/home/mir/lib/python/django/db/backends/postgresql/base.py", line 86, in _commit return self.connection.commit() DatabaseError: {<cursor object at 0xb721b390>: 'FEHLER: Einf\xc3\xbcgen oder Aktualisieren in Tabelle \xc2\xbbserializers_article\xc2\xab verletzt Fremdschl\xc3\xbcssel-Constraint \xc2\xbbauthor_id_refs_id_8fd3731\xc2\xab\nDETAIL: Schl\xc3\xbcssel (author_id)=(4) ist nicht in Tabelle \xc2\xbbserializers_author\xc2\xab vorhanden.\n'} ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: transaction.leave_transaction_management() Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[39]>", line 1, in ? transaction.leave_transaction_management() File "/home/mir/lib/python/django/db/transaction.py", line 70, in leave_transaction_management raise TransactionManagementError("Transaction managed block ended with pending COMMIT/ROLLBACK") TransactionManagementError: Transaction managed block ended with pending COMMIT/ROLLBACK ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: article = Article.objects.get(pk=3) Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[40]>", line 1, in ? article = Article.objects.get(pk=3) File "/home/mir/lib/python/django/db/models/manager.py", line 73, in get return self.get_query_set().get(*args, **kwargs) File "/home/mir/lib/python/django/db/models/query.py", line 226, in get raise self.model.DoesNotExist, "%s matching query does not exist." % self.model._meta.object_name DoesNotExist: Article matching query does not exist. ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: article Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[41]>", line 1, in ? article NameError: name 'article' is not defined ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: article.categories.all() Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[42]>", line 1, in ? article.categories.all() NameError: name 'article' is not defined ---------------------------------------------------------------------- File "/home/mir/src/triage/tests/modeltests/serializers/models.py", line ?, in modeltests.serializers.models.__test__.API_TESTS Failed example: article.author Exception raised: Traceback (most recent call last): File "/home/mir/lib/python/django/test/doctest.py", line 1243, in __run compileflags, 1) in test.globs File "<doctest modeltests.serializers.models.__test__.API_TESTS[43]>", line 1, in ? article.author NameError: name 'article' is not defined
comment:2 by , 18 years ago
comment:3 by , 18 years ago
Sorry. I made a fresh checkout, and now the tests succeed (with postgresql). I was completely unaware that the checkout was so old ...
follow-up: 5 comment:4 by , 18 years ago
I don't like this solution. Until now, the serializer provided a nice way to move data between Django installations: I could email somebody the serialised output and they could run it as an addition to their database. Or I could save it away and use it later after arbitrary things have happened in my database. But you can't do that if you also want to control the primary key values, because it may clash with existing ones.
I think we need to be smarter here, if we can.
comment:5 by , 18 years ago
Replying to mtredinnick:
I don't like this solution. Until now, the serializer provided a nice way to move data between Django installations: I could email somebody the serialised output and they could run it as an addition to their database. Or I could save it away and use it later after arbitrary things have happened in my database. But you can't do that if you also want to control the primary key values, because it may clash with existing ones.
I'm not sure the existing serializer works in the way you suggest. In the existing implementation (of JSON, at least):
- A serialization file contains the primary key for each object serialized; when each object is saved, it will overwrite any object in the database with that primary key.
- If a serialized object contains a reference (FK or m2m) to an object, the deserializer looks in the database for an existing object with that primary key. If an object with that primary key already exists in the database, or has previously been deserialized, the reference is set; if it doesn't, an integrity error is thrown (hence this bug report).
Neither of these properties allow you to arbitrarily add blocks of data to the database. Primary keys have always been controlled by the serializer, and existing database objects have always been overwritten by deserialized ones.
I suppose you could add an 'overwrite' flag to the deserializer to differentiate between 'create a new object' and 'overwrite existing objects' when an object is deserialized - but then it would be difficult to keep references consistent. When my serialized object references PK=3, does it mean the PK=3 object in my file, or the PK=3 object in the database into which it is being loaded?
comment:6 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Fix for circular/forward deserializer references