#11742 closed (invalid)
simplejson loads not always unicode
Reported by: | kwek | Owned by: | nobody |
---|---|---|---|
Component: | Uncategorized | Version: | 1.1 |
Severity: | Keywords: | ||
Cc: | mjbroek@… | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In our project we use import simplejson from django as it provides a nice wrapper around python having json compiled in itself or the use of simplejson. But now we've upgraded to django 1.1 i noticed the following difference.
See the output of python without simplejson installed. Notice that it always returns as unicode for loads().
>>> from django.utils import simplejson >>> simplejson.loads('"test"', encoding='utf-8') u'test' >>> simplejson.loads('"test"') u'test' >>> Now see the output with simplejson installed. Notice that this returns as a normal string and not as unicode as some would expect. {{{ >>> from django.utils import simplejson >>> simplejson.loads('"test"') 'test' >>> simplejson.loads('"test"', encoding='utf-8') 'test' }}} And here the output of the python 2.6 json module (always unicode too) {{{ >>> import json >>> json.loads('"tuut"') u'tuut' }}} So is this by design or did this inconsistency sneak in with the new release? In the meantime we will just import json from python directly (2.6) but i thought ill mention it anyway.
Change History (3)
comment:1 by , 15 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 15 years ago
Input for loads() casted to unicode() and all is good now (even with simplejson).. thanks!
comment:3 by , 15 years ago
actually did this the correct trick for me:
data = simplejson.loads(data.decode('utf-8'), object_hook=json_object_hook)
Note:
See TracTickets
for help on using tickets.
It didn't exactly sneak in, a deliberate decision was made as to which simplejson implementation to use, see r9707.
The fact that simplejson is inconsistent here is a simplejson issue, see: http://code.google.com/p/simplejson/issues/detail?id=40
The behavior you are describing is due to an optimization, and the response to people reporting it appears to be that if you consistently want unicode back you should consistently feed unicode in.