#3878 closed (fixed)
(JSON)-serializing utf8 data fails
Reported by: | Owned by: | Malcolm Tredinnick | |
---|---|---|---|
Component: | Core (Serialization) | Version: | 0.96 |
Severity: | Keywords: | utf8 unicode-branch | |
Cc: | django@…, reza@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
If i try to serialize data from the database (for example using fixtures), which is utf8-encoded, the JSON output will contain
unicode-escapes (\uXXXX) which will not be loaded back allright.
Example:
>>> obj = Blah() >>> obj.test = "ö" >>> obj.save()
./manage.py dumpdata > blah.json ./manage.py loaddata blah
>>> obj = Blah.objects.all()[0] >>> print obj.test blök
Attachments (1)
Change History (13)
comment:1 by , 18 years ago
comment:2 by , 18 years ago
Has patch: | set |
---|---|
Needs tests: | set |
Version: | SVN → 0.96 |
I fixed this with the following simple patch:
--- Django-0.96/django/utils/simplejson/encoder.py 2007-01-31 00:34:15.000000000 +0200 +++ /usr/lib/python2.4/site-packages/django/utils/simplejson/encoder.py 2007-04-09 18:04:29.000000000 +0300 @@ -247,7 +247,7 @@ class JSONEncoder(object): encoder = encode_basestring_ascii else: encoder = encode_basestring - yield encoder(o) + yield encoder(o.decode('utf-8')) elif o is None: yield 'null' elif o is True:
comment:3 by , 18 years ago
i haven't tested the patch, but unfortunately there's a problem with it:
you're assuming that the bytestring-data the user has is encoded in UTF-8.
and that's not always true.
(another approach would be to use settings.DEFAULT_CHARSET,
but that one is still not 100% correct)
but, to "bring" also good news, a django-branch has been created to switch
it completely to unicode. with that done, this problem wouldn't be there.
comment:4 by , 18 years ago
Owner: | changed from | to
---|---|
Triage Stage: | Unreviewed → Accepted |
This will be easiest to fix in the unicode branch. It's on the TODO list there. It's intended to be a short-lived sprinting branch, so I think it's best to leave this to be fixed there and then merged back.
The good news is that on that branch, your fix is absolutely the right idea, although we have some helper functions to make it easier.
Leaving the ticket open so that we remember to ensure it really is fixed.
comment:5 by , 18 years ago
Saik: use the patch given above. Report back if you still have problems.
comment:6 by , 18 years ago
Cc: | added |
---|
comment:7 by , 18 years ago
Summary: | (JSON)-serializing utf8 data fails → [unicode] (JSON)-serializing utf8 data fails |
---|
comment:8 by , 18 years ago
comment:9 by , 18 years ago
Keywords: | unicode-branch added |
---|---|
Summary: | [unicode] (JSON)-serializing utf8 data fails → (JSON)-serializing utf8 data fails |
This was fixed in the unicode branch in [5248] (without changing simplejson.py at all, since that already works well with bytestrings and unicode). I'll close this ticket when the branch is merged back into trunk.
comment:10 by , 18 years ago
Cc: | added |
---|
comment:11 by , 18 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
i haven't checked the django-fixture-code, but this problem is very similar to a problem with simplejson,
so probably it is the cause:
with the simplejson serializer.
like this example:
and of course this is wrong.
but:
is ok.
so in short, when working with simplejson and non-ascii characters,
then all strings that go into dumps have to be unicode-strings (not bytestrings)