Code

Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#11742 closed (invalid)

simplejson loads not always unicode

Reported by: kwek Owned by: nobody
Component: Uncategorized Version: 1.1
Severity: Keywords:
Cc: mjbroek@… Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

In our project we use import simplejson from django as it provides a nice wrapper around python having json compiled in itself or the use of simplejson. But now we've upgraded to django 1.1 i noticed the following difference.

See the output of python without simplejson installed. Notice that it always returns as unicode for loads().

>>> from django.utils import simplejson
>>> simplejson.loads('"test"', encoding='utf-8')
u'test'
>>> simplejson.loads('"test"')
u'test'
>>> 

Now see the output with simplejson installed. Notice that this returns as a normal string and not as unicode as some would expect. 
{{{
>>> from django.utils import simplejson
>>> simplejson.loads('"test"')
'test'
>>> simplejson.loads('"test"', encoding='utf-8')
'test'
}}}

And here the output of the python 2.6 json module (always unicode too)
{{{
>>> import json
>>> json.loads('"tuut"')
u'tuut'
}}}

So is this by design or did this inconsistency sneak in with the new release? In the meantime we will just import json from python directly (2.6) but i thought ill mention it anyway. 

Attachments (0)

Change History (3)

comment:1 Changed 5 years ago by kmtracey

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

It didn't exactly sneak in, a deliberate decision was made as to which simplejson implementation to use, see r9707.

The fact that simplejson is inconsistent here is a simplejson issue, see: http://code.google.com/p/simplejson/issues/detail?id=40

The behavior you are describing is due to an optimization, and the response to people reporting it appears to be that if you consistently want unicode back you should consistently feed unicode in.

comment:2 Changed 5 years ago by kwek

Input for loads() casted to unicode() and all is good now (even with simplejson).. thanks!

comment:3 Changed 5 years ago by kwek

actually did this the correct trick for me:

data = simplejson.loads(data.decode('utf-8'), object_hook=json_object_hook)

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.