Opened 3 weeks ago

Last modified 43 hours ago

#36986 assigned Cleanup/optimization

Add a hook to Serializer._value_from_field() for complex fields like CompositePrimaryKey

Reported by: Tim Graham Owned by: Tim Graham
Component: Core (Serialization) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The Python serializer's _value_from_field() method using isinstance for special-case handling of CompositePrimaryKey.

Custom model fields can't add similar logic without subclassing the serializer, a cumbersome solution which doesn't scale well in the presence of multiple custom fields.

My motivation for a generic hook is to add serialization support for MongoDB's EmbeddedModelField, EmbeddedModelArrayField, PolymorphicEmbeddedModelField, and PolymorphicEmbeddedModelArrayField.

An early prototype:

  • django/core/serializers/python.py

    diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py
    index 2929874b01..1a4f5a1336 100644
    a b class Serializer(base.Serializer):  
    3939        data["fields"] = self._current
    4040        return data
    4141
     42    def get_fields_from_model(self, obj, *, polymorphic=False):
     43        data = {
     44            field.name: self._value_from_field(obj, field)
     45            for field in obj._meta.local_fields
     46        }
     47        if polymorphic:
     48            data["_label"] = obj._meta.label
     49        return data
     50
    4251    def _value_from_field(self, obj, field):
    4352        if isinstance(field, CompositePrimaryKey):
    4453            return [self._value_from_field(obj, f) for f in field]
     54        if hasattr(field, "embedded_model"):
     55            sub_obj = getattr(obj, field.attname)
     56            if sub_obj is None:
     57                return None
     58            if isinstance(sub_obj, list):
     59                return [self.get_fields_from_model(sub) for sub in sub_obj]
     60            return self.get_fields_from_model(sub_obj)
     61        if hasattr(field, "embedded_models"):
     62            sub_obj = getattr(obj, field.attname)
     63            if sub_obj is None:
     64                return None
     65            if isinstance(sub_obj, list):
     66                return [
     67                    self.get_fields_from_model(sub, polymorphic=True) for sub in sub_obj
     68                ]
     69            return self.get_fields_from_model(sub_obj, polymorphic=True)
    4570        value = field.value_from_object(obj)
    4671        # Protected types (i.e., primitives like None, numbers, dates,
    4772        # and Decimals) are passed through as is. All other values are

I don't have an immediate proposal for the API change to accomplish this.

Change History (12)

comment:1 by Vishy Algo, 3 weeks ago

Could we do a protocol-based hook similar to this?

  • django/core/serializers/python.py

    diff --git a/django/core/serializers/python.py b/django/core/serializers/python.py
    index 73ba24368b..b2e61d1b72 100644
    a b class Serializer(base.Serializer):  
    4040    return data
    4141
    4242    def _value_from_field(self, obj, field):
     43        if hasattr(field, 'get_serializer_value'):
     44            return field.get_serializer_value(obj)
    4345        if isinstance(field, CompositePrimaryKey):
    4446            return [self._value_from_field(obj, f) for f in field]
    4547        value = field.value_from_object(obj)

I guess, this gives custom fields full control while keeping it generic.

comment:2 by Tim Graham, 3 weeks ago

That's insufficient. Notice that both CompositePrimaryKey and my example in the description require the ability to call self._value_from_field recursively. I'm considering the possibility of a registry that would allow registering a field's function with the serializer, similar to how lookups are registered to model fields.

comment:3 by Vishy Algo, 3 weeks ago

Owner: set to Vishy Algo
Status: newassigned

comment:4 by Jacob Walls, 3 weeks ago

Triage Stage: UnreviewedAccepted

Thanks, Tim. Would your field be able to implement __iter__(), like CompositePrimaryKey does?

comment:5 by Tim Graham, 2 weeks ago

Not for the polymorphic fields, where each field value may be a model instance of a different class with different fields. The implementation in the description uses obj._meta.local_fields rather than getting the list of subfields from field.

comment:6 by Tim Graham, 2 weeks ago

For another serializer API concern, see 47651eadb8ca7aacddad41da4df64fd2af11faae which fixed handling of JSONField in the XML serializer by special casing it. I didn't intend to address XML serialization in this ticket, but it could be useful if the new API were generic enough to clean that up as well.

And while I only intended to address serialization in this ticket, I found that a deserialization hook is also likely needed. Not only does the JSONField/XML patch demonstrate a need, but for the various embedded model fields, the field.to_python() handling in the Python Deserializer._handle_object() almost works, except that embedded subfields are serialized to the database using Field.column instead of Field.name as you would expect to find in fixtures. Thus, the correct deserialization for embedded model fields is EmbeddedModelField.to_python() but with field.to_python(value[field.name]) instead of field.to_python(value[field.column]).

comment:7 by Vishy Algo, 13 days ago

We can Introduce a format-aware serialization registry on django.db.models.Field mapping serializer formats ('python', 'xml', etc.) to handler callables. To support recursive execution, the callable signature must accept the active serializer instance as context: handler(field, obj, serializer).

This allows the removal of hardcoded isinstance checks within the core serializers. For example, CompositePrimaryKey will encapsulate its serialization compilation via the registry:
CompositePrimaryKey.register_serializer_handler('python', composite_python_serializer)

Similarly, with the JSON field JSONField.register_serializer_handler('xml', json_xml_serializer)

comment:8 by Tim Graham, 12 days ago

The ideas seem plausible. My idea was to do the registrations on the serializer rather than on the field, but either path could be viable. Since the json, jsonl, and pyyaml serializers all subclass the Python serializer, it's unclear to me that hooks are needed for them. Are these your original ideas or are they from an LLM?

comment:9 by Vishy Algo, 11 days ago

This, I followed up on the registry (lookup) approach as you mentioned above, at implementing very similar pattern for serializer registration with recursion support. Although, it is redundant hooks that's needed for subclasses. Would making the registration on the serializers eliminate it?

comment:10 by Vishy Algo, 11 days ago

I guess it would! We could simply register the Python serializer and override serialization for specific fields in relevant serializers. Which approach you think is more appropriate and extensible?

comment:11 by Tim Graham, 8 days ago

The registration approach might be overcomplicated. In my latest approach, I added Field.serialize_to_python() and deserialize_from_python() hooks. That's enough for MongoDB's embedded model fields and plays nicely with composite fields as well. You could see if similar hooks for the XML serializer would allow cleaning up the JSONField-specific code there.

comment:12 by Tim Graham, 43 hours ago

Owner: changed from Vishy Algo to Tim Graham

I added Field.serialize_to_xml() and deserialize_from_xml() to my exploratory branch and it seems to work well. I'll refine it and make a PR.

Note: See TracTickets for help on using tickets.
Back to Top