Opened 8 years ago

Closed 6 years ago

Last modified 4 years ago

#26291 closed New feature (fixed)

loaddata cannot deserialize fixtures with forward references and natural foreign keys

Reported by: Peter Inglesby Owned by: Peter Inglesby
Component: Core (Serialization) Version: 1.9
Severity: Normal Keywords:
Cc: Russell Keith-Magee, dev@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

The following fixture cannot be deserialized, unless an Author object with name "John Steinbeck" already exists. This is because when Django tries to deserialize the first object, it tries to load an Author with name "John Steinbeck", which fails.

    "model": "",
    "fields": {
      "title": "East Of Eden",
      "author": ["John Steinbeck"]
    "model": "",
    "fields": {
      "name": "John Steinbeck",
      "date_of_birth": "1902-02-27"

This problem is avoidable with careful ordering of fixture loading, but is still a problem with fixtures that have circular references.

I have a proposed fix: see PR 6221

Change History (15)

comment:1 by Tim Graham, 8 years ago

Description: modified (diff)
Type: UncategorizedNew feature

The proposed code looks rather complicated. What you are saying is that the instructions about fixture ordering aren't sufficient due to the possibility of circular references, correct? Could you give an example of that to motivate the feature a bit more?

comment:2 by Peter Inglesby, 8 years ago

Thanks for taking a look at this.

There are a handful of problems with Django's current dependency resolution.

Firstly, loaddata cannot handle circular references at all.

Secondly, even without circular references, dumpdata can produce data that loaddata cannot load, requiring manual re-ordering of a fixture. This could happen if a field of a model instance references another instance of the same model, where the pk of the referenced model is greater than the pk of the referring model, since dumpdata dumps instances of a model in order of it pk.

Finally, Django's current dependency resolution depends on loaddata importing data that was created by dumpdata. If the data comes from another source, sort_dependencies's logic must be followed.

This last problem is the cause of my own interest in the issue, but I think that the fact that dumpdata can produce data that loaddata cannot load is reason enough to consider a solution.

comment:3 by Tim Graham, 8 years ago

Cc: Russell Keith-Magee added

Russ, as the author of the natural key serialization support in 35cc439228cd32dfa7a3ec919db01a8a5cd17d33, I wonder if you could take a look and let us know if you think this looks like a good idea or if you foresee any problems?

comment:4 by Russell Keith-Magee, 8 years ago

@timgraham I've left some comments on the PR about specific implementation details, but the broad-strokes approach seems sound. Unlike PK references, Natural keys depend on being able to issue queries on the database, so there's no way to resolve a forward reference until the object actually exists. This means that deferring selected objects is the only approach that is going to work. I can't think of any situation where this will cause a regression in behavior - it just means a subset of all fixtures that wasn't previously loadable now will be. AFAICT, the code that Peter has added should only be activated in the case of fixtures that can't currently be parsed at all.

The only alternative that I can see would be to improve sort_dependencies so that it only issued fixtures in a format that can be read again. However, that doesn't improve the general case that Peter has identified - fixtures generated by an external source.

The dependency resolution issue that Peter flags *should* (AFAIK) only exist for natural key fixtures (for the reason identified previously), and MySQL InnoDB data stores (because InnoDB's implementation of referential integrity is demonstrably broken). On other data stores, forward PK references aren't an issue.

comment:5 by Peter Inglesby, 8 years ago

Thanks for the review -- I've replied on the PR.

I think the only functionality question that needs to be resolved is whether django.core.serializers.python.Deserializer has a handle_forward_references option that defaults to False. That was there for backwards compatibility, but it's probably not necessary, so I'll remove.

There are some documentation changes required -- I'll work on those in the next few days.

Also, I did look into improving sort_dependencies to output fixtures in an order that has no forward references (ignoring circular references), but I think it's fiddly to do efficiently.

comment:6 by Tim Graham, 8 years ago

Needs documentation: set
Triage Stage: UnreviewedAccepted

comment:7 by Brillgen Developers, 8 years ago

Cc: dev@… added

comment:8 by Peter Inglesby, 8 years ago

Needs documentation: unset

comment:9 by Peter Inglesby, 8 years ago

Owner: changed from nobody to Peter Inglesby
Status: newassigned

comment:10 by Carlton Gibson, 6 years ago

Triage Stage: AcceptedReady for checkin

comment:11 by Tim Graham, 6 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

I did a brief review of the patch.

comment:12 by Tim Graham <timograham@…>, 6 years ago

In 73f7d175:

Extracted deserialize fk/m2m functions from Deserializer.

In preparation for handling forward references (refs #26291).

comment:13 by Tim Graham <timograham@…>, 6 years ago

Resolution: fixed
Status: assignedclosed

In 312eb5cb:

Fixed #26291 -- Allowed loaddata to handle forward references in natural_key fixtures.

comment:14 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In fca36f3c:

Refs #26291 -- Added tests for dumpdata with forward references in natural keys.

comment:15 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 26799c65:

Refs #26291 -- Added tests for dumpdata/loaddata with forward references without natural keys.

Note: See TracTickets for help on using tickets.
Back to Top