Opened 2 weeks ago

Closed 2 weeks ago

#35885 closed Uncategorized (invalid)

JSONField does accept strings that look like dicts and incorrectly saves them as strings, breaking JSON filtering

Reported by: DataGreed Owned by:
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: DataGreed Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Steps to reproduce:

  1. Create a model with JSONField, create a migration and migrate
  2. Create an instance of this model, set the value of JSON field to a string generated by json.dumps() from any dictionary, e.g. "{"foo":"bar"}
  3. Save the model instance
  4. Instance save successfully

Expected result:

Instance is saved with the JSONField set to a dictionary {"foo": "bar"}, and JSONField deep filtering works properly

Actual result:

Instance is saved with a JSONField value saved as a string. Filtering does not work, and when you query this instance from a database, the JSONField value will be a string, not a dict.

I tried it on PostgreSQL and sqlite - both working incorrectly.
Note that if I set JSONField value to an actual dict it works correctly, but it is too easy to make a mistake and then waste hours of debugging the reasons the JSON filtering does not work, since developers usually sanitize the data at least by dumping the dict to json to make sure it is even possible to dump.

It seems like the fix would be to parse the string value in JSONField before saving.
My current workaround is to do the following on my models with JSONFields:

class SomeModel(models.Model):

    json_field = models.JSONField(blank=True, null=True)

    def clean(self):
        try:
            if isinstance(self.json_field, str):
                # make sure we are not trying to save string to JSONField, django allows this for some reason
                # and it breaks filtering on json
                self.arguments = json.loads(self.json_field)
        except ValueError as e:
            # reraise as validation error. We do this to get more actionable description for the error
            raise ValidationError(str(e))

    def save(self, *args, **kwargs):

        # ensure clean runs. Well, kind of - we can still directly update the fields,
        # and it will somewhat break data integrity, so just don't do that maybe? thanks :)
        self.full_clean()
        super().save(*args, **kwargs)

Change History (1)

comment:1 by Simon Charette, 2 weeks ago

Component: UncategorizedDatabase layer (models, ORM)
Resolution: invalid
Status: newclosed

JSON accepts top level string as valid input; that is including JSON serialized as a string.

[...] since developers usually sanitize the data at least by dumping the dict to json to make sure it is even possible to dump.

I don't think this is a normal behavior. Usually data transits through through serialization layers that are in charge of parsing JSON strings as Python objects just like for datetimes, decimal, float, and other types.

Having JSONField auto-magically perform a json.loads when provided a string input is backward incompatible and prevents storing JSON de-serializable strings as top level values (e.g. "1", "false", "null").

Note: See TracTickets for help on using tickets.
Back to Top