Code

Opened 7 years ago

Last modified 2 years ago

#4656 new New feature

Allow In-depth serialization by specifying depth to follow relationship

Reported by: jay.m.martin@… Owned by: nobody
Component: Core (Serialization) Version:
Severity: Normal Keywords: feature
Cc: mattimustang@…, datavortex@…, django@…, tosters@…, tomchristie, kmike84@…, hv@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

It would be nice to be able to specify whether to traverse relationships when calling serialize.

For example, say you had 2 models defined as:

class Reporter(models.Model):
    first_name = models.CharField(maxlength=30)
    last_name = models.CharField(maxlength=30)
    email = models.EmailField()

class Article(models.Model):
    headline = models.CharField(maxlength=100)
    pub_date = models.DateField()
    reporter = models.ForeignKey(Reporter)

If you serialize a queryset from the Article model, the result will only contain the primary keys of the related Reporter models. I think an additional argument specifying the maximum depth to traverse relationships (similar to select_related). So if you pass 1 as the depth in the above example, the result will contain the complete deserialization of the related Reporter objects as well.

Attachments (3)

base.py (2.7 KB) - added by Matthew Flanagan <mattimustang@…> 6 years ago.
python.py (4.8 KB) - added by Matthew Flanagan <mattimustang@…> 6 years ago.
json.py (984 bytes) - added by Matthew Flanagan <mattimustang@…> 6 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 7 years ago by Simon G. <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from In-depth serialization to Allow In-depth serialization by specifying depth to follow relationship
  • Triage Stage changed from Unreviewed to Design decision needed

comment:2 Changed 7 years ago by russellm

  • Triage Stage changed from Design decision needed to Accepted

Sounds reasonable to me.

comment:3 Changed 6 years ago by russellm

  • Keywords feature added

comment:4 Changed 6 years ago by Matthew Flanagan <mattimustang@…>

Hi,

Attached is my implementation of this as discussed in this thread http://groups.google.com/group/django-users/browse_thread/thread/c930cf920e726bbd/4faa358b8c91365d .

It is currently not a patch against django.core.serializers but rather independent subclasses. If there is general agreement to how I've implemented this then I can work on producing it as a patch against django. The new serializers are also backwards compatible with the Django ones. The only incompatible part being that a "fully" serialized object cannot be deserialized at the moment.

I have tests for these too but they are not as easily extracted from my project. I'll work on that and post them here if I get the go ahead.

Changed 6 years ago by Matthew Flanagan <mattimustang@…>

Changed 6 years ago by Matthew Flanagan <mattimustang@…>

Changed 6 years ago by Matthew Flanagan <mattimustang@…>

comment:5 Changed 6 years ago by Matthew Flanagan <mattimustang@…>

I'm also hosting the living version of this code at http://code.google.com/p/wadofstuff/ .

comment:6 Changed 6 years ago by ericholscher

  • milestone set to post-1.0

This is post-1.0, but I'm certainly interested in it.

comment:7 Changed 5 years ago by Matthew Flanagan <mattimustang@…>

  • Cc mattimustang@… added

Added some documentation and examples on how to use the classes here http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers

comment:8 Changed 5 years ago by ericholscher

Matthew,

This is some interesting work. I like a lot of the concepts (and following relations on serialization is something that I have had to hack myself). First off, can you attach a patch on django that shows what you are changing instead of posting files. This makes it a lot easier to see what you are proposing.

I think that the excludes functionality is something that is really useful. We have the fields option that allows you to be inclusive of fields, I think that an "everything but this one" is a useful shortcut. Given, it is still possible just by including everything else. I think that this is a good idea.

The extras functionality is interesting, but I don't see a lot of the value in it. If you're including the output of functions based on the data, why not just include the data? I understand that in your writeup it is a trivial example, but I think serializing the output of functions is not wise. Functions change a lot more often than model fields, and it isn't guarenteed that you can really "deserialize" a function. I'm curious how you're implementing this (or if it's even possible). I think it is an interesting idea perhaps to provide some kind of documentation inside of the fixture, and might have other uses.

In your implementation of relations you are taking another "include these" approach. In the above ticket it talks about having something along those lines, but having you be able to specify a depth, and it follows all relationsips. That said, the API that you've implemented for the relations stuff is a bit cumbersome.

relations={'permissions':{'relations':{'content_type':{'excludes':('app_label',)} }}}

just screams out of a better way of being represented. It seems like you're trying to almost reimplement the queryset api. I think that a lot of this filtering functionality can come from the queryset that you are passing in to be serialized.

I think that having serializers just dump out all of the relations a model points to might lead to some problems, and if included in Django it will probably be optional, and off by default. I know the snippet http://www.djangosnippets.org/snippets/918/ has implemented ForeignKey and ManytoMany relationships, along with adding a "slicing" syntax to serializers. The slicing part is for another ticket, but that code might be a good reference for trying to have fixtures span relationships.

So in summary, I think having serializers spanning relationships is a big win. The exclusion syntax on the serializer seems like it should be included for completeness and because it is useful. The other two bits of functionality certainly seem useful, but are probably a bit too specific to go into Django. They work nicely as a third party app.

comment:9 Changed 5 years ago by Matthew Flanagan <mattimustang@…>

Eric,

Thanks for taking the time to look at it. I'll rework it as a patch on Django's own serializers but I wrote it as a standalone module that can be added to a project using the SERIALIZATION_MODULES setting so I didn't have to maintain it as a patch against Django trunk in my own projects.

The extras functionality is a DRY thing for me e.g. putting things like get_absolute_url into JSON or other simple non-field properties/methods that I don't want to repeat the logic for in my templates. I'm using extjs grids for presenting data in an enterprise app and all the grid rows are populated with AJAX calls to a REST API using this serializer.

The relations approach is based off how Rails does it. I'm not reimplementing the queryset api nor performng any filtering but specifying arguments to pass to subsequent calls to serialize() on each related or sub-related object. See handle_fk_field() in http://wadofstuff.googlecode.com/svn/trunk/python/django/serializers/python.py for gory details.

For example

relations={'permissions':{'relations':{'content_type':{'excludes':('app_label',)} }}}

Sort of breaks down to pseudocode to:

   Serialize each permissions relation
        Serialize each content_type relation excluding the app_label field

I find exclusion very handy and quite succinct. Which would you prefer to write to exclude a User's password from being serialized:

serializers.serialize('json', User.objects.all(), excludes=('password',))

or

serializers.serialize('json', User.objects.all(), fields=('username', 'first_name', 'last_name', 'email', 'is_staff', 'is_active', 'is_superuser', 'last_login', 'date_joined', 'groups', 'user_permissions')

comment:10 Changed 5 years ago by Matthew Flanagan <mattimustang@…>

I should also add that the extras functionality is not intended to be deserialized.

comment:11 Changed 5 years ago by datavortex

  • Cc datavortex@… added

I am also using the extras functionality to serialize non-field properties of models, and I do think this feature is generally useful enough to be included in Django. In my case, my application uses two datastores for its information, and the django models are supplemented with information from the second datastore. Sometimes I want to serialize these properties and sometimes I don't. The ability to use the extras argument solves my problem.

comment:12 Changed 5 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

comment:13 Changed 5 years ago by ericholscher

@Matthew: Good points.

I don't know if all serializers are meant to always be deserializable, or if that is just something that has happened in the past and should happen. I see now how that functionality can be useful (as you are using them, as a return from a view to populate data). Most of my experience with serializers have been with load/dump data, I sometimes forget they are really useful for serializing data in other situations.

It should be noted that these issues are seperate from the actual reason that this ticket is open, so opening another ticket with each of your proposed additions is probably advisable. The exclude stuff is a pretty simple and logical change, so that having that in its own ticket would be nice. I can see the extras functionality having more discussion, so splitting that off as well is probably good.

The relations functionality seems to be a way to implement this ticket, so that stuff is probably good to stay here :) Providing a patch on this ticket providing that functionality against Django (along with the relative depth as stated in the title) would probably be a good start to getting something in a format that can be committed. I like the idea of including this stuff, but I'm not sold on the API, but now I understand better how it's useful :)

Cheers.

comment:14 Changed 5 years ago by vitalyperessada

Another possible angle is to add QuerySet.deep_values() as per following posting. The code works on single object but it could be easily extended to work with iterator.
http://groups.google.com/group/django-developers/browse_thread/thread/933cdb7b49880471

I stumbled on this during json serialization attempt, but having something like deep_values() could have more future uses since it will allow converting django.models object tree into primitive nested data structure.

If django gods agree :-) I could open a separate ticket and finalize code.

comment:15 Changed 4 years ago by orokusaki

@Matthew Flanagan I would prefer to use either. I mean, it would be nice if it operated like model forms in that if you say excludes=('myfield',), then it does all but that one, and if you say includes=('myfield',), it only does that one. I especially think includes is very important because it's more explicit. If you have a big model and an actively changed API, and you only have excludes, then what happens when somebody adds a new "secure" field to the model? It suddenly shows up in the output.

comment:16 Changed 4 years ago by orokusaki

  • Triage Stage changed from Accepted to Design decision needed

How about this?

# If the model was FootballTeam, this could be an example:
recursive=(
    'coach',
    ('game',
        ('player',
            'record', 'newsevent', ('fight',
                                       'lawsuit',
                                       'policeraid'
                                    )
        )
    ),
    ('teamsoffshorebank',
        'taxfraudcase',
        'illegitimatechild'
    ),
    'peprally'
)

This is very flexible. After all, who just wants a to say levels=3 and then hope they didn't forget a secure billing model somewhere down the line.

The code logic is not that difficult. Basically if it's a tuple, then the first element of the tuple is directly related to the previous level. (A player has a record, news event, and a fight. A fight has a lawsuit and a police raid, and so on. I know my models are ridiculous but I think the syntax is pretty pragmatic.)

comment:17 Changed 4 years ago by russellm

  • Version SVN deleted

@orokusaki: That's some might fine LISP you've written :-)

Seriously - there's more to this problem than just specifying a tree of attributes to dump (although that is certainly part of the problem). The rendering format is also an issue. For example, should a foreign key be represented as:

{
   "pk": 1,
   "model": "app.foo",
   "fields": {
      "name": "first",
      "child": {
         "pk": 1,
         "model": "app.bar",
         "fields": {
            "attrib": 3
         }
      }
   }
}

or

{
   "pk": 1,
   "model": "app.foo",
   "fields": {
      "name": "first",
      "child": {
         "pk": 1,
         "attrib": 3
      }
   }
}

or

{
   "pk": 1,
   "name": "first",
   "child": {
      "pk": 1,
      "attrib": 3
   }
}

or something else entirely? Different users will have different needs. And I haven't even started looking into the XML-related problems (what tag name do I use? Do I use an attribute or CDATA to represent content?)

This isn't something we're going to be able to cram into the arguments of a single function. in order to give full customizability, we're going to need to provide a class-based mechanism for describing serialization.

comment:18 Changed 4 years ago by orokusaki

@russellm Why start with a diss on my code (that's called a tuple and 2-tuples, both very common in Django and Python). Read the title of this ticket, and read my full example. It's about specifying recursion depth. Not about excludes= or feilds=. That just happened to be added on half way through. My version is not for specifying fields to be exported. Its to represent which relationships (how deep) should be exported. Sure a class, or a model Meta: may be the way to describe which format / fields get exported, but my version merely says: "Export football team, and it's related players, and their related records, and so on (the "related" being models not fields).

comment:19 Changed 4 years ago by russellm

@orokusaki My opening comment was a light-hearted (hence the smiley) attempt to draw attention to the fact that your proposal is very LISPy, but not very Pythonic. Yes, tuples and lists are Python data structures - that doesn't mean that nesting them N deep is a good idea. Off the top of my head, the only place we use nested 2-tuples is in the choices list on forms (to provide optgroups) and in specifying arguments to template loaders - and in both of those cases, the nesting is limited to depth 2. I can't think of any examples of arbitrarily nested tuples in Django.

The reason I mentioned the excludes/fields issue is that I'm not especially inclined to start applying piecemeal fixes to serialization when there is a much bigger problem lurking - that of customization. Any good solution for customizable serialization will necessitate the ability to have fine grained control over recursion, the fields that are to be included, and the format in which they will be displayed. What you've proposed will fix one small part of a larger problem, and may do so in a way that constrains our ability to implement a truly generic solution.

comment:20 Changed 4 years ago by orokusaki

@russellm I was thinking of only one side of the solution and figuring that it would be within the model's meta that you would describe the format, etc. I understand upon further thinking that the model's meta is orthogonal to the output and different scenarios might need different options (which my idea doesn't allow). on "light-hearted": dually noted. Thanks.

comment:21 Changed 4 years ago by Alex

  • Triage Stage changed from Design decision needed to Accepted

comment:22 Changed 4 years ago by anonymous

  • Cc django@… added

comment:23 Changed 3 years ago by t0ster

  • Cc tosters@… added

comment:24 Changed 3 years ago by tomchristie

  • Cc tomchristie added

comment:25 Changed 3 years ago by TylerBrock

I would love for this to be in the next Django release.

Bump!

comment:26 Changed 3 years ago by gabrielhurley

  • Severity set to Normal
  • Type set to New feature

comment:27 Changed 3 years ago by kmike

  • Cc kmike84@… added
  • Easy pickings unset

comment:28 Changed 3 years ago by guettli

  • Cc hv@… added
  • UI/UX unset

comment:29 Changed 3 years ago by tomchristie

There is some movement on this ATM.
I sprinted on the task at DjangoCon.eu and made some progress, which Russ broadly gave a thumbs up to.

If anyone's interested in helping out and finally making some progress on this it'd be great if you could get in touch.
I've emailed the dev list with a rough proposal as it stands, and there's an initial implementation, see here...

https://groups.google.com/d/topic/django-developers/H2EKZBsRlFY/discussion

There's certainly more work to do in terms of making sure the interface is something we'll be happy with, that ties in nicely with the exiting Forms API style, and with dealing with both serialization and de-serialization, but I think what we have now looks about right. (Positive criticisms are very welcomed of course.)

Prob. best to take any further discussion to the dev list.

comment:30 Changed 2 years ago by anonymous

  • Patch needs improvement set

This implementation does quite a lot queries that can be easily saved!

Why can't you just change the implementation of the serializers to follow the fields retrieved using the 'select_related' method?

Then you can also follow django's already existing conventions and allow the 'fields' property to accept values like: "related_modelfield1" like in the 'filter' method?

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.