Code

Opened 2 years ago

Closed 23 months ago

Last modified 19 months ago

#18023 closed Cleanup/optimization (fixed)

Stop bundling simplejson and deprecate the module

Reported by: ogier Owned by: aaugustin
Component: Core (Serialization) Version: master
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consensus on the django-dev mailing list seemed to be to stop bundling simplejson, replace it with just a simplejson -> json fallback shim, and issue a PendingDeprecationWarning as of 1.5.

I had to leave the implementation of simplejson in django/utils/simplejson/__init__.py because though it makes much more sense to move to django/utils/simplejson.py like most other utils modules, doing so causes circular self-imports when we try to import simplejson from the system.

Attachments (4)

remove_simplejson.diff (48.3 KB) - added by ogier 2 years ago.
remove_simplejson_absolute.diff (48.7 KB) - added by Clueless 2 years ago.
Alternate patch, uses from __future__ import absolute_import so that we can have a simplejson.py
use_json_over_simplejson.diff (16.2 KB) - added by Clueless 2 years ago.
simplejson_docs.diff (6.1 KB) - added by Clueless 2 years ago.

Download all attachments as: .zip

Change History (17)

Changed 2 years ago by ogier

comment:1 Changed 2 years ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

If it doesn't eat too much time, could you also upload the alternate patch which uses simplejson when it is installed?

comment:2 Changed 2 years ago by ogier

Sorry I wasn't clear. This patch uses simplejson when it is installed. The only thing it does is stop django from redistributing simplejson, because when we drop 2.5 support in django 1.5 we can depend on there being a json module in the standard library.

comment:3 Changed 2 years ago by charettes

@ogier about the circular import thing, can't you just use from __future__ import absolute_import?

comment:4 Changed 2 years ago by ptone

  • Needs documentation set

A couple notes on documenting

Looks like docs will need updating in:

  • topics/class-based-views.txt
  • topics/serialization.txt

Should the warning text endorse the stdlib module, or just suggest you need to use a different json lib?

You will ultimately need to include the release note text in the patch, but I don't think that file has yet been created in SVN - so you can wait or take a stab at creating the initial file as part of this patch.

comment:5 Changed 2 years ago by Clueless

I ran some performance tests to see if the conclusions in http://bugs.python.org/issue6013 really matter. If we are deprecating simplejson we should recognize the performance impact it will have. All modules have their C extensions compiled.

CPython 2.6.7:

$ python -m timeit -s "from json import loads, dumps" "loads(dumps(range(32)))" 
1000 loops, best of 3: 583 usec per loop
$ python -m timeit -s "from json import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"
1000 loops, best of 3: 375 usec per loop

$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(range(32)))"
10000 loops, best of 3: 28.8 usec per loop
$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"
10000 loops, best of 3: 34.6 usec per loop

CPython 2.7.2:

$ python -m timeit -s "from json import loads, dumps" "loads(dumps(range(32)))"                         
10000 loops, best of 3: 25.1 usec per loop
$ python -m timeit -s "from json import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"      
10000 loops, best of 3: 42.5 usec per loop

$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(range(32)))"
10000 loops, best of 3: 28.5 usec per loop
$ python -m timeit -s "from simplejson import loads, dumps" "loads(dumps(dict(enumerate('abcdefghijklmno'))))"
10000 loops, best of 3: 35.2 usec per loop

Here's some data from a real application with a deep model:

CPython 2.6.7 without simplejson:

>>> import timeit
>>> t = timeit.Timer('deserialize("json", serialize("json", team))', 'from django.core.serializers import serialize, deserialize; from djmac.registration.models import Team; team = Team.objects.all()[0:5]')
>>> print t.timeit(1000)
1.6712679863

CPython 2.6.7 with simplejson:

>>> import timeit
>>> t = timeit.Timer('deserialize("json", serialize("json", team))', 'from django.core.serializers import serialize, deserialize; from djmac.registration.models import Team; team = Team.objects.all()[0:5]')
>>> print t.timeit(1000)
1.54401898384

So, it seems in python 2.6, simplejson is faster by at least an order of magnitude. But on the whole it doesn't matter all that much: actually serializing models is dominated by time spent elsewhere (probably the python serialization part of the process) meaning its only around 8% slower and besides, the difference goes away in 2.7.

So, while I was initially hesitant to deprecate the shim because of performance concerns I think I've come around. If someone depends critically on the speedup that simplejson provides, they can always elect not to use Django's serializers. I will go ahead and remove all references to simplejson in Django core and update the docs.

Changed 2 years ago by Clueless

Alternate patch, uses from __future__ import absolute_import so that we can have a simplejson.py

Changed 2 years ago by Clueless

Changed 2 years ago by Clueless

comment:6 Changed 2 years ago by aaugustin

  • Owner changed from nobody to aaugustin

comment:8 Changed 2 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

comment:9 follow-up: Changed 2 years ago by aaugustin

  • Has patch unset
  • Needs documentation unset
  • Resolution fixed deleted
  • Severity changed from Normal to Release blocker
  • Status changed from closed to reopened

This thread on the mailing list and this comment on the commit indicate that this change may be backwards incompatible.

Right now I don't know what the correct resolution is. In all cases it should be documented here.

Last edited 2 years ago by aaugustin (previous) (diff)

comment:10 in reply to: ↑ 9 Changed 2 years ago by ogier

Replying to aaugustin:

This thread on the mailing list and this comment on the commit indicate that this change may be backwards incompatible.

Right now I don't know what the correct resolution is. In all cases it should be documented here.

Here's what I believe is an accurate summary of the situation:

First issue: str vs. unicode

The simplejson API is documented as returning unicode strings for all JSON keys and string values. http://simplejson.readthedocs.org/en/latest/index.html#simplejson.JSONDecoder

In fact, the optional C library (installed by default when installing simplejson) makes an API-changing optimization: if the value is entirely ASCII characters, a str instance is returned instead of a unicode instance. https://github.com/simplejson/simplejson/blob/master/simplejson/_speedups.c#L527

The Python standard library json package was forked from simplejson at version 1.9, and included this optimization. However, it was considered a bug and fixed in python 2.7. http://bugs.python.org/issue11982

The net result is that it is possible that some code that runs on manually installed versions of simplejson or python <= 2.6 depends on this behavior. I think that Python core's approach is correct: we should consider this a bug, because it is behavior that directly contradicts the documentation. It's worth pointing out that any code that depends on this behavior is already broken on python 2.7.

Second issue: namedtuple_as_object

This is a systemic problem with any shim that is supposed to be transparent. In this case, namedtuple_as_object is a keyword argument to simplejson.JSONEncoder that was added in simplejson 2.2. The latest version of simplejson that was merged into the python standard library is 2.0.9 and Django 1.4 shipped with simplejson 2.0.7 bundled. This means that nowhere in stock Django are there any references to this keyword. This wouldn't be a huge problem, except that Django subclasses django.utils.simplejson.JSONEncoder as django.core.serializers.json.DjangoJSONEncoder, and naturally doesn't know about this keyword argument.

This is a deeper problem than just passing along unrecognized keyword arguments to the base class in __init__() to make sure that DjangoJSONEncoder never breaks. If we want to deprecate the django.utils.simplejson shim, then DjangoJSONEncoder must become a subclass of json.JSONEncoder. This has the unfortunate effect that calls to simplejson.dumps({'hi':'there'}, cls=DjangoJSONEncoder) will all fail (this is the simplejson on PyPI I'm talking about). Basically DjangoJSONEncoder will be incompatible with simplejson, which is a backwards incompatibility.

The problem is that even leaving in the shim and subclassing as we do isn't a good solution. It means that DjangoJSONEncoder needs to not only be backwards compatible through all supported Python versions, but they also must be forwards compatible with the latest simplejson releases, which have already been shown to diverge in backwards incompatible ways. (Adding an argument to a method on a base class breaks contravariance.)

If that paragraph seemed confusing, think of it this way: if DjangoJSONEncoder is a json.JSONEncoder then passing it to simplejson.dumps() is dangerous. If DjangoJSONEncoder is a simplejson.JSONEncoder then passing it to json.dumps() is dangerous.

comment:11 Changed 2 years ago by lukeplant

I agree - we should just consider simplejson as buggy on all counts, and document the incompatibilities in release notes.

comment:12 Changed 23 months ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from reopened to closed

In [610746f6cb89c13b364c27fc41ecc9592cbe1605]:

Fixed #18023 -- Documented simplejson issues.

Thanks Luke Plant for reporting the issue and Alex Ogier for thoroughly
investigating it.

comment:13 Changed 19 months ago by stavros

I would like to revisit this, as I (django-annoying) have been bitten by the bug above (the simplejson.dumps({'hi':'there'}, cls=DjangoJSONEncoder) one).

Right now, even just using django.utils.simplejson with DjangoJSONEncoder, as above, breaks. Changing the import statement in django.core.serializers to from django.utils import simplejson as json fixes this, as it imports from the same module as Django's provided JSON library, which is, at least, consistent.

I think this switch should be made, to stop modules using Django-provided serializers from breaking, at least.

EDIT: Actually, strike that. If you're deprecating the Django simplejson module, you wouldn't want to refer to it. I'll comment below with a better suggestion.

Last edited 19 months ago by stavros (previous) (diff)

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.