#34539 closed Bug (fixed)

`get_prep_value` no longer called for JSONField

Reported by: Julie Rymer Owned by: Julie Rymer
Component: Database layer (models, ORM) Version: 4.2
Severity: Release blocker Keywords:
Cc: Simon Charette, Florian Apolloner, Tim Graham Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Julie Rymer)

Hello, I just upgraded from django 4.1 to 4.2 and I have a custom JSONField with a get_prep_value() override that stopped working. After searching a bit, I saw that was because JSONField.get_prep_value() is no longer called in 4.2 (5c23d9f0c32f166c81ecb6f3f01d5077a6084318).

I think this issue need a resolution either:

  • JSONField should call get_prep_value() like all other fields type, because this is the method that the documentation tell us to override in custom fields. Otherwise we need to override get_db_prep_value() which is heavier and does not have the same purpose. I think simply replacing connection.ops.adapt_json_value(value, self.encoder) with connection.ops.adapt_json_value(self.get_prep_value(value), self.encoder) in JSONField.get_db_prep_value() would fix this

PS: #34397 seems to be related but in fact is about Django 3.2 so it is not the current issue

Attachments (1)

stack.png (45.9 KB ) - added by Shaheed Haque 11 months ago.
stack frame

Download all attachments as: .zip

Change History (30)

comment:1 by Julie Rymer, 12 months ago

Description: modified (diff)

comment:2 by Natalia Bidart, 12 months ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Accepting this since, as the reporter says, the docs suggest this method will be called for most fields.

in reply to:  description ; comment:3 by Mariusz Felisiak, 12 months ago

Thanks for report. The way how built-in fields are implemented is an implementation detail and can change without the deprecation path.

Replying to Julie Rymer:

Hello, I just upgraded from django 4.1 to 4.2 and I have a custom JSONField with a get_prep_value() override that stopped working. After searching a bit, I saw that was because JSONField.get_prep_value() is no longer called in 4.2 (5c23d9f0c32f166c81ecb6f3f01d5077a6084318).

I think this issue need a resolution either:

  • JSONField should call get_prep_value() like all other fields type, because this is the method that the documentation tell us to override in custom fields.

Both methods get_prep_value() and get_db_prep_value() are documented there. I don't see any preferences for get_prep_value() 🤔.

Otherwise we need to override get_db_prep_value() which is heavier and does not have the same purpose. I think simply replacing connection.ops.adapt_json_value(value, self.encoder) with connection.ops.adapt_json_value(self.get_prep_value(value), self.encoder) in JSONField.get_db_prep_value() would fix this

This is not really an exception, there are many built-in fields that don't call get_prep_value() in get_db_prep_value(), e.g. ArrayField or DurationField.

You can check how, for example, Wagtail solved the same issue on their side.

comment:4 by Mariusz Felisiak, 12 months ago

Cc: Simon Charette Florian Apolloner added

comment:5 by Mariusz Felisiak, 12 months ago

Also, the docs that you mentioned is Field API reference not How to create a custom field or How to override build-in fields, so it only describes methods available in the Field API.

in reply to:  3 comment:6 by Julie Rymer, 12 months ago

Replying to Mariusz Felisiak:

Thanks for report. The way how built-in fields are implemented is an implementation detail and can change without the deprecation path.

...

Both methods get_prep_value() and get_db_prep_value() are documented there. I don't see any preferences for get_prep_value() 🤔.

It doesn't seems like an implementation detail when those function are clearly documented for customisation.
As for the preference, the first paragraph of the Field API doc is

Field is an abstract class that represents a database table column. Django uses fields to create the database table (db_type()), to map Python types to database (get_prep_value()) and vice-versa (from_db_value()).

get_prep_value is the documented method for mapping Python types to database, which is what I was trying to do, get_db_prep_value is only for backend specific need so not the most common case (hence why only get_prep_value is put foremost in the intro I suppose).

Replying to Mariusz Felisiak:

Also, the docs that you mentioned is Field API reference not How to create a custom field or How to override build-in fields, so it only describes methods available in the Field API.

You are right, however the customisation doc still does not mention that get_prep_value() is not always available for override.

Replying to Mariusz Felisiak:

This is not really an exception, there are many built-in fields that don't call get_prep_value() in get_db_prep_value(), e.g. ArrayField or DurationField.

Then it should be changed, or documented, for those fields also :)

Like wagtail, I also ended up overriding get_db_prep_value() instead of get_prep_value(), but this seems like too much work when I don't care about the backend and simply want to serialise my custom object.

comment:7 by Natalia Bidart, 12 months ago

Julie, could you tell us a little bit more about how you are using get_prep_value? That could help us understand a bit better the use case to make further decisions. Thanks!

comment:8 by Mariusz Felisiak, 12 months ago

It doesn't seems like an implementation detail when those function are clearly documented for customisation.

Field.get_prep_value() is documented for the customization, not JSONField.get_prep_value().

comment:9 by Julie Rymer, 12 months ago

My usecase is the following:
I have a db json field that I use to keep data in the following format [{id: str, label: str, length: Decimal, height: Decimal, width: Decimal, weight: Decimal}]
For easier manipulation and validation, I have a dataclass containing those properties. I created a custom JSONField to convert to/from a list of that dataclass to a simple python dict to be inserted in db.

From the doc

If your custom Field class deals with data structures that are more complex than strings, dates, integers, or floats, then you may need to override from_db_value() and to_python().

And so I overrode from_db_value() and to_python()

Further along the doc

Since using a database requires conversion in both ways, if you override from_db_value() you also have to override get_prep_value() to convert Python objects back to query values.

And so I overrode get_prep_value(). But get_prep_value() is never called, alas. This behaviour is not consistent from what is expected.

I know that this doc is about Field.get_prep_value() and not JSONField.get_prep_value(), but unless I missed something, there is no specific documentation for JSONField.get_prep_value(), nor forewarning in Field.get_prep_value(), so the obvious guess is that this doc apply to JSONField.get_prep_value() as well as any other Field class.

I think the question to solve is

  • do we want to add a warning that the documented rules for customisation does not apply to some Field classes and to document the missing rules, or
  • should we harmonize instead all Field classes that don't follow the rules yet and make their behaviour consistent with current documentation.

I am more in favour of the second option since it does seems to be less of a change effort and more practical for the end-user.

You can find my custom Field implementation here:

from __future__ import annotations

import decimal
import json
from dataclasses import dataclass, fields
from decimal import Decimal

from django.core import exceptions
from django.db import models
from django.utils.translation import gettext_lazy as _


@dataclass(repr=True, eq=True, unsafe_hash=True)
class ProductPackage:
    id: str
    length: Decimal  # in cm
    height: Decimal  # in cm
    width: Decimal  # in cm
    weight: Decimal  # in kg
    label: str = ""

    @staticmethod
    def from_dict(package_json: dict) -> ProductPackage:
        return ProductPackage(
            id=package_json.get("id"),
            length=Decimal(package_json["length"]),
            height=Decimal(package_json["height"]),
            width=Decimal(package_json["width"]),
            weight=Decimal(package_json["weight"]),
            label=package_json.get("label", ""),
        )

    def to_dict(self) -> dict:
        return dict((field.name, getattr(self, field.name)) for field in fields(self))

    @classmethod
    def from_list(cls, packages_json: list[dict]) -> list:
        return list(
            cls.from_dict(package_json) for package_json in (packages_json or [])
        )

    def __copy__(self):
        return self.copy()

    def copy(self):
        return self.__class__(**self.to_dict())


class PackageJSONField(models.JSONField):
    """
    ProductPackage class will make package dimension more practical to use
    and this class also handle Decimal so they can be serialized/deserialized from JSON
    """

    description = "A JSON field to store a list of ProductPackage"
    default_error_messages = {
        **models.JSONField.default_error_messages,
        "invalid_packages": _("Value must be valid list of packages."),
        "invalid_decimal": _("Package dimensions must be valid decimal strings."),
    }

    def from_db_value(self, value, expression, connection) -> list[ProductPackage]:
        packages_json = super().from_db_value(value, expression, connection)
        return ProductPackage.from_list(packages_json)

    def to_python(self, value):
        if value is None:
            return value
        if isinstance(value, ProductPackage):
            return value

        try:
            if isinstance(value, list):
                return ProductPackage.from_list(value)
            return json.loads(value, cls=self.decoder)
        except json.JSONDecodeError:
            raise exceptions.ValidationError(
                self.error_messages["invalid"],
                code="invalid",
                params={"value": value},
            )
        except TypeError:
            raise exceptions.ValidationError(
                self.error_messages["invalid_packages"],
                code="invalid_packages",
                params={"value": value},
            )
        except decimal.InvalidOperation:
            raise exceptions.ValidationError(
                self.error_messages["invalid_decimal"],
                code="invalid_decimal",
                params={"value": value},
            )

    def get_prep_value(self, value: list[ProductPackage | dict]):
        if value is None:
            return value
        if any(isinstance(package, dict) for package in value):
            value = ProductPackage.from_list(value)
        value = list(
            {
                "id": package.id,
                "label": package.label,
                "length": str(package.length),
                "height": str(package.height),
                "width": str(package.width),
                "weight": str(package.weight),
            }
            for package in value
        )
        return super().get_prep_value(value)

    def get_db_prep_value(self, value, connection, prepared=False):
        # the override I ended up doing for the current issue

        if isinstance(value, models.expressions.Value) and isinstance(
            value.output_field, models.JSONField
        ):
            value = value.value
        elif hasattr(value, "as_sql"):
            return value
        return connection.ops.adapt_json_value(self.get_prep_value(value), self.encoder)

in reply to:  9 ; comment:10 by Mariusz Felisiak, 12 months ago

Replying to Julie Rymer:

I think the question to solve is

  • do we want to add a warning that the documented rules for customisation does not apply to some Field classes and to document the missing rules, or
  • should we harmonize instead all Field classes that don't follow the rules yet and make their behaviour consistent with current documentation.

I am more in favour of the second option since it does seems to be less of a change effort and more practical for the end-user.

This will required adding LOC (probably many) that are not used by Django itself and aren't tested (unless we will add many tests with custom subclasses of built-in fields). This can also cause a lot of backward incompatible changes in the behavior. I'm strongly against it as it'll set a dangerous precedent and increase the maintenance burden.

I'll wait for other opinions.

in reply to:  10 ; comment:11 by Julie Rymer, 12 months ago

Replying to Mariusz Felisiak:

This will required adding LOC (probably many) that are not used by Django itself and aren't tested (unless we will add many tests with custom subclasses of built-in fields). This can also cause a lot of backward incompatible changes in the behavior. I'm strongly against it as it'll set a dangerous precedent and increase the maintenance burden.

I'll try to address Mariusz' concerns.

From what I understand, the change would only be a matter of adding value=self.get_prep_value(value) is the concerned Field classes' get_db_prep_value (JSONField, ArrayField, DurationField, other ?). The default Field.get_prep_value definition is the following:

    def get_prep_value(self, value):
        """Perform preliminary non-db specific value checks and conversions."""
        if isinstance(value, Promise):
            value = value._proxy____cast()
        return value

So it doesn't seems to do much so I don't see what backward incompatible change that would cause.
But if there is concern about it, get_prep_value could be overridden in the concerned Field classes to directly return the value instead of doing any operation, hence guaranteed no behaviour change.

This addition to the code will be tested, because get_db_prep_value is already tested (right?) and we are not adding any behaviour that should involve any explicit test. We are just adding an easier entrypoint for customisation, the customisation itself is not for django to test so I don't see what additional maintenance burden it would add.

The only potential burden I see is a new rule for future changes done on get_db_prep_value of any standard field class to take care to call get_prep_value even if it does nothing.

Perhaps I missed something, do tell me if that is the case.

On the contrary, updating the documentation to keep track of any Field class not following the rule and explaining how to do customisation for each of them seems like a lot more involved effort.

in reply to:  11 ; comment:12 by Mariusz Felisiak, 12 months ago

Replying to Julie Rymer:

    def get_prep_value(self, value):
        """Perform preliminary non-db specific value checks and conversions."""
        if isinstance(value, Promise):
            value = value._proxy____cast()
        return value

So it doesn't seems to do much so I don't see what backward incompatible change that would cause.

The number of LOC is not important. We are talking not only about JSONField. If we will agree that all built-in fields must implement the entire Field API and fulfill everything that is in the Field API reference, we will be force to make many other changes.

This addition to the code will be tested, because get_db_prep_value is already tested (right?)

This is not true, removing get_prep_value() will not cause any tests to fail, so it won't be tested.

On the contrary, updating the documentation to keep track of any Field class not following the rule and explaining how to do customisation for each of them seems like a lot more involved effort.

Personally, I'd close this ticket as invalid. As far as I'm aware, neither docs nor code changes are required.

in reply to:  12 comment:13 by Julie Rymer, 12 months ago

Replying to Mariusz Felisiak:

This is not true, removing get_prep_value() will not cause any tests to fail, so it won't be tested.

You're right, but nothing a spy+assert_called_once can't solve :p

Also, I though I'd mention that I'd be willing to make the necessary changes in case this scenario is selected (I'd not have that self-confidence for a doc update however )

Finally as a last input, I'll mention that the custom field doc actually explicitly elaborate on some standard Field type gotcha when they deviate from the Field API contract: AutoField and FileField.

comment:14 by Mariusz Felisiak, 12 months ago

You're right, but nothing a spy+assert_called_once can't solve :p

This is something we use as a last resort, not something that is a good API testing.

It's pretty obvious that we're not going to reach a consensus, so let's wait for other opinions.

comment:15 by Mariusz Felisiak, 12 months ago

Cc: Tim Graham added

comment:16 by Natalia Bidart, 12 months ago

As a Django user, I agree that the current documentation does imply that get_prep_value() would be called for any Field instance. I always assumed that everything documented for Field as a toplevel class applies to all children (except explicitly stated otherwise), so I can personally relate to the ticket description.

Having said the above, if modifying the Django code could generate extra maintenance burden and/or introduce patterns that are not desired (such as including code that is not used by Django itself), I would at least suggest that the documentation is extended to explicitly say that get_prep_value() may not be called in every Django-provided field.

comment:17 by Jatin-tec, 12 months ago

Owner: changed from nobody to Jatin-tec
Status: newassigned

comment:18 by Natalia Bidart, 12 months ago

Owner: Jatin-tec removed
Status: assignednew
Triage Stage: AcceptedUnreviewed

Hi Jatin-tec, sorry for the confusion but this ticket is not yet ready to be worked on, we still haven't decided on the best course of action. It could be that we change code, or docs, or nothing.

Thanks for understanding! I'll be removing you from the assignee field for now.

comment:19 by Simon Charette, 12 months ago

I don't hold a strong opinion against restoring the call to get_prep_value given how common using JSONField to persist Python datastructure is.

I never assumed that we guaranteed that get_prep_lookup would be called by get_db_prep_lookup but I can see how it can be interpreted this way from the docs.

Given this was an unintended side-effect of ​5c23d9f0c32f166c81ecb6f3f01d5077a6084318 I'd be +1 to having get_db_prep_value start by calling get_prep_value when prepared=False.

As for the other fields that don't respect this contract that Mariusz identified I think that if we wanted to be coherent with the docs we should likely adapt them as well in a follow up ticket.

For as much as this API is awkward and open to interpretation it has been around for years so trying to stick to it as much as possible seems like a worthy approach?

in reply to:  19 ; comment:20 by Mariusz Felisiak, 12 months ago

Triage Stage: UnreviewedAccepted

Replying to Simon Charette:

I never assumed that we guaranteed that get_prep_lookup would be called by get_db_prep_lookup but I can see how it can be interpreted this way from the docs.

Given this was an unintended side-effect of ​5c23d9f0c32f166c81ecb6f3f01d5077a6084318 I'd be +1 to having get_db_prep_value start by calling get_prep_value when prepared=False.

OK, persuaded, let's restore it for JSONField.

Julie, please prepare a patch (via GitHub PR) with a regression test and release notes (4.2.2.txt.)

in reply to:  20 comment:21 by Julie Rymer, 12 months ago

Replying to Mariusz Felisiak:

Replying to Simon Charette:

I never assumed that we guaranteed that get_prep_lookup would be called by get_db_prep_lookup but I can see how it can be interpreted this way from the docs.

Given this was an unintended side-effect of ​5c23d9f0c32f166c81ecb6f3f01d5077a6084318 I'd be +1 to having get_db_prep_value start by calling get_prep_value when prepared=False.

OK, persuaded, let's restore it for JSONField.

Julie, please prepare a patch (via GitHub PR) with a regression test and release notes (4.2.2.txt.)

Cool, I'll try to do this this w-e :)

comment:22 by Mariusz Felisiak, 12 months ago

Owner: set to Julie Rymer
Status: newassigned

comment:24 by Natalia Bidart, 11 months ago

Patch needs improvement: set

comment:25 by Mariusz Felisiak, 11 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:26 by Mariusz Felisiak <felisiak.mariusz@…>, 11 months ago

Resolution: fixed
Status: assignedclosed

In 0ec6066:

Fixed #34539 -- Restored get_prep_value() call when adapting JSONFields.

Regression in 5c23d9f0c32f166c81ecb6f3f01d5077a6084318.

comment:27 by Mariusz Felisiak <felisiak.mariusz@…>, 11 months ago

In 9c301814:

[4.2.x] Fixed #34539 -- Restored get_prep_value() call when adapting JSONFields.

Regression in 5c23d9f0c32f166c81ecb6f3f01d5077a6084318.

Backport of 0ec60661e61b153e6bcec64649b1b7f524eb3e18 from main

by Shaheed Haque, 11 months ago

Attachment: stack.png added

stack frame

comment:28 by Shaheed Haque, 11 months ago

Resolution: fixed
Status: closednew

This has caused a behaviour change for me which I think may be a regression. I'm not especially knowledgeable about the issues debated above, but here is what I see. First, I am using an add-on app called social_django which sets a JSONField and then saves the value here:

https://github.com/python-social-auth/social-app-django/blob/8d0a2052d1b22a899454571c62237d23aa25af97/social_django/storage.py#L24

Second, on the stack, I have the frames show in the attachment "stack frame", and the important point is that when I enter the two new lines of code added, value is a dict, but the value is flattened into a string by the new code.

The result is that the database JSONField has the saved string. I believe that to be incorrect. Previously, the dict was saved as expected.

(Perhaps the intent here was to only have this change apply to subclasses of JSONField?)

Advice appreciated.

comment:29 by Mariusz Felisiak, 11 months ago

Resolution: fixed
Status: newclosed

Please don't reopen closed tickets. If you believe there is an issue in Django, then please create a new ticket in Trac and follow our bug reporting guidelines.

Note: See TracTickets for help on using tickets.
Back to Top