Opened 7 years ago

Closed 5 years ago

#29249 closed New feature (fixed)

Make serializers consistently unicode by default.

Reported by: hakib Owned by: Hasan Ramezani
Component: Core (Management commands) Version: dev
Severity: Normal Keywords: dumpdata, unicode
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

It is currently not (easily) possible to use the dumpdata managemet command on models with unicode data.
The JSON serializer used by dumpdata is not accepting the ensure_ascii argument used by json.dumps as an argument.

Since ensure_ascii=True is the default, I suggest adding a --dont-ensure-ascii flag to the dumpdata managemet command so it will be easier to use dumpdata with unicode.

./manage.py dumpdata app.model --dont-ensure-ascii

I'm not sure what are the implications on other serializers such as YAML, XML etc.

Change History (10)

comment:1 by Tim Graham, 7 years ago

Component: UtilitiesCore (Management commands)
Summary: Add option to dumpdata with unicode dataAdd option to dumpdata with unicode JSON

I believe this would only apply to the JSON serializer, and I'm not sure about adding a dumpdata option that's specific to a particular serializer. I think your best solution is to subclass the JSON serializer, register it as a custom format, and then use that format in dumpdata.

comment:2 by hakib, 7 years ago

The JSON serializer has a ensure_ascii attribute and the YAML serializer has a allow_unicode attribute. I already submitted a PR implementing the flag in both serializers.

I haven't looked at the XML serializer yet but I'm sure it will be possible there as well.

As someone who works with unicode as the primary language for most apps (as I'm sure a lot of other developers do) it's a very a useful feature to be able to dump fixtures directly from local db in a readable format.

Last edited 7 years ago by Tim Graham (previous) (diff)

comment:3 by Tim Graham, 7 years ago

Has patch: set
Patch needs improvement: set
Triage Stage: UnreviewedAccepted

Okay. My main concern is that an option calls --allow_unicode may suggest to readers that all serializers prohibit unicode by default. That may not be true. Your patch also needs documentation.

comment:4 by Tim Graham, 7 years ago

Summary: Add option to dumpdata with unicode JSONAdd option to dumpdata to allow unicode JSON or YAML

comment:5 by Hasan Ramezani, 5 years ago

Owner: changed from nobody to Hasan Ramezani
Patch needs improvement: unset
Status: newassigned
Last edited 5 years ago by Mariusz Felisiak (previous) (diff)

comment:6 by Mariusz Felisiak, 5 years ago

Has patch: unset
Summary: Add option to dumpdata to allow unicode JSON or YAMLMake serializers consistently unicode by default.
Version: 2.0master

Current behavior is inconsistent. XML serializer use Unicode by default, on the other hand YAML and JSON serializers force ASCII. I think we should make this behavior consistent instead of adding a new serializer-specific option, i.e. pass allow_unicode=True to yaml.dump() and ensure_ascii=False to json.dump().

comment:7 by Hasan Ramezani, 5 years ago

Has patch: set
Version 0, edited 5 years ago by Hasan Ramezani (next)

comment:8 by Mariusz Felisiak, 5 years ago

Triage Stage: AcceptedReady for checkin

comment:9 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

In 8970bb4:

Refs #29249 -- Added tests for serializing Unicode data with XML serializer.

comment:10 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

Resolution: fixed
Status: assignedclosed

In 68fc21b:

Fixed #29249 -- Made JSON and YAML serializers use Unicode by default.

Note: See TracTickets for help on using tickets.
Back to Top