Opened 8 years ago

Closed 7 years ago

#27538 closed Bug (duplicate)

Value of JSONField is being re-encoded to string even though being already encoded

Reported by: Petar Aleksic Owned by:
Component: contrib.postgres Version: 1.10
Severity: Normal Keywords: JSONField
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

What I assume the problem is, is that the value of the JSONField is being re-encoded on every fetch or update, even though it had already been encoded and the value of the field hadn't been changed. This eventually causes an exponential growth of backslashes in the value of the field in the database leading to an InternalError: invalid memory alloc request size.

Here is how to reproduce the bug with shell:
Let's say there is a model named MyModel with a JSONField defined as follows:

json_field = JSONField(
        blank=True,
        null=True,
        default=dict
)

Now, after importing neccesary models, let's do some data IO operations in shell:

my_model = MyModel.objects.get(id=1)
my_model.json_field = {"foo":"bar"}
my_model.save()
my_model.json_field

The last command prints out: {'foo': 'bar'} which is perfectly fine. Let's now fetch the model again and print the value of the json field:

my_model = MyModel.objects.get(id=1)
my_model.json_field

This prints out '{"foo": "bar"}', so we se that the dict has been converted to string (probably somewhere with json.dumps). If we run my_model.save() without changing the value of json_field (which is a real-world scenario, for instance we might have wanted to change other fields and then run save) and fetch it again, the value of json_field wil be doubly-encoded, although the value already is a valid JSON string :

my_model.save()
my_model = MyModel.objects.get(id=1)
my_model.json_field

Last command now prints out '"{\\"foo\\": \\"bar\\"}"' . Obviously the string has now been re-encoded, causing some characters to be escaped. Next iteration of these steps results in:
'"\\"{\\\\\\"foo\\\\\\": \\\\\\"bar\\\\\\"}\\""'

If we repeat these actions multiplte times the value will grow in a very fast manner due to escaping the backslashes with backslashes. In only a few iterations I managed to have pg_dump (only for data) create a 1GB output file.

I am not sure whether the the cause for this bug resides in Django's implementation of the JSONField or maybe in the psycopgb's implementation of postgres JSONField, but somewhere on the fetch from database, the value of the field is being encoded to string with json.dumps, and this is being repeatedly done on every fetch, despite the value being already encoded to a valid JSON string. It is my assumption that this happens on fetch, it might be the case, that on update (my_model.save()) re-encoding and thus re-escaping takes place.
Same happens if we never change the value of the json_field. If it initially was an empty json obj {} , after only a few iterations it will grow to '"\\"\\\\\\"{}\\\\\\"\\""'

Attachments (1)

27538-test.diff (642 bytes ) - added by Tim Graham 8 years ago.

Download all attachments as: .zip

Change History (5)

comment:1 by Tim Graham, 8 years ago

Resolution: worksforme
Severity: Release blockerNormal
Status: newclosed
Type: UncategorizedBug

I wrote the attached test which passes for me. Could you please provide a test that fails for you?

by Tim Graham, 8 years ago

Attachment: 27538-test.diff added

comment:2 by Petar Aleksic, 8 years ago

I didn't manage to reproduce the bug with the test or in a fresh django project.

comment:3 by Waken Meng, 7 years ago

Resolution: worksforme
Status: closednew

I have the same problem, Re-json the JSONField value.

from django.contrib.postgres.fields import JSONField

class Foo(models.Model):
    photos = JSONFields(max_length=300)

>>f = Foo()
>>f.photos = []
>>f.save()
>>f.photos
u'[]'

>>f.save()
>>f.refresh_from_db()
>>f.photos
u'"[]"'

>>f.save()
>>f.refresh_from_db()
>>f.photos
u'"\\"[]\\""'

Above was in manage.py shell, and everytime I save the instance, the JSONField value is re-jsonized.

env:

postgres 9.6.1
python 2.7.12

django 1.10.5
psycopg2 2.6.2

comment:4 by Tim Graham, 7 years ago

Resolution: duplicate
Status: newclosed

This behavior doesn't reproduce in Django's test suite. Are you using django-jsonfield? See ticket:27675#comment:8.

Note: See TracTickets for help on using tickets.
Back to Top