#3878 closed (fixed)
(JSON)-serializing utf8 data fails
Reported by: | Owned by: | Malcolm Tredinnick | |
---|---|---|---|
Component: | Core (Serialization) | Version: | 0.96 |
Severity: | Keywords: | utf8 unicode-branch | |
Cc: | django@…, reza@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
If i try to serialize data from the database (for example using fixtures), which is utf8-encoded, the JSON output will contain
unicode-escapes (\uXXXX) which will not be loaded back allright.
Example:
>>> obj = Blah() >>> obj.test = "ö" >>> obj.save()
./manage.py dumpdata > blah.json ./manage.py loaddata blah
>>> obj = Blah.objects.all()[0] >>> print obj.test blök
Attachments (1)
Change History (13)
comment:1 Changed 16 years ago by
comment:2 Changed 16 years ago by
Has patch: | set |
---|---|
Needs tests: | set |
Version: | SVN → 0.96 |
I fixed this with the following simple patch:
--- Django-0.96/django/utils/simplejson/encoder.py 2007-01-31 00:34:15.000000000 +0200 +++ /usr/lib/python2.4/site-packages/django/utils/simplejson/encoder.py 2007-04-09 18:04:29.000000000 +0300 @@ -247,7 +247,7 @@ class JSONEncoder(object): encoder = encode_basestring_ascii else: encoder = encode_basestring - yield encoder(o) + yield encoder(o.decode('utf-8')) elif o is None: yield 'null' elif o is True:
comment:3 Changed 16 years ago by
i haven't tested the patch, but unfortunately there's a problem with it:
you're assuming that the bytestring-data the user has is encoded in UTF-8.
and that's not always true.
(another approach would be to use settings.DEFAULT_CHARSET,
but that one is still not 100% correct)
but, to "bring" also good news, a django-branch has been created to switch
it completely to unicode. with that done, this problem wouldn't be there.
comment:4 Changed 16 years ago by
Owner: | changed from Jacob to Malcolm Tredinnick |
---|---|
Triage Stage: | Unreviewed → Accepted |
This will be easiest to fix in the unicode branch. It's on the TODO list there. It's intended to be a short-lived sprinting branch, so I think it's best to leave this to be fixed there and then merged back.
The good news is that on that branch, your fix is absolutely the right idea, although we have some helper functions to make it easier.
Leaving the ticket open so that we remember to ensure it really is fixed.
Changed 16 years ago by
Attachment: | xml_serializer_error.txt added |
---|
uft8 problem with xml serializer
comment:5 Changed 16 years ago by
Saik: use the patch given above. Report back if you still have problems.
comment:6 Changed 16 years ago by
Cc: | django@… added |
---|
comment:7 Changed 16 years ago by
Summary: | (JSON)-serializing utf8 data fails → [unicode] (JSON)-serializing utf8 data fails |
---|
comment:8 Changed 16 years ago by
comment:9 Changed 16 years ago by
Keywords: | unicode-branch added |
---|---|
Summary: | [unicode] (JSON)-serializing utf8 data fails → (JSON)-serializing utf8 data fails |
This was fixed in the unicode branch in [5248] (without changing simplejson.py at all, since that already works well with bytestrings and unicode). I'll close this ticket when the branch is merged back into trunk.
comment:10 Changed 16 years ago by
Cc: | reza@… added |
---|
comment:11 Changed 16 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
i haven't checked the django-fixture-code, but this problem is very similar to a problem with simplejson,
so probably it is the cause:
with the simplejson serializer.
like this example:
and of course this is wrong.
but:
is ok.
so in short, when working with simplejson and non-ascii characters,
then all strings that go into dumps have to be unicode-strings (not bytestrings)