Opened 11 months ago
Closed 11 months ago
#35944 closed Cleanup/optimization (fixed)
Postgresql: ArrayField with Unicode characters gets serialized as string of "\u XXXX" characters
| Reported by: | Oleg Sverdlov | Owned by: | Oleg Sverdlov |
|---|---|---|---|
| Component: | Core (Serialization) | Version: | 5.1 |
| Severity: | Normal | Keywords: | ArrayField, postgresql, JSON |
| Cc: | Oleg Sverdlov | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
In ArrayField.value_to_string(self, obj) the value is encoded using JSON.dumps(values), which produces escaped Unicode \u XXXX by default.
For example, an ArrayField with 3 elements ["один", "два", "три"] (1,2,3 in Russian) will produce
["\u043e\u0434\u0438\u043d", "\u0434\u0432\u0430", "\u0442\u0440\u0438"]
While this is not a bug per se, this becomes a nuisance when viewing on result of "dumpdata" management command:
The ArrayField fields will be encoded differently from other text fields.
Perhaps there should be an option to turn on/off the ensure_ascii parameter in JSON.dumps(values, ensure_ascii=option)) ?
The option can be enabled by default, as we do for 'hstore' field, or perhaps enabled conditionally:
- in the field settings ArrayField(name='numbers', ascii_only=False)
- in settings.py ( ARRAY_FIELD_ENSURE_ASCII )
I will be glad to submit a patch.
Change History (6)
comment:1 by , 11 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|---|
| Type: | Uncategorized → Cleanup/optimization |
comment:2 by , 11 months ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:3 by , 11 months ago
| Has patch: | set |
|---|
comment:5 by , 11 months ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
Given we made the decision to have JSON serialization default to
ensure_ascii=Falsewhen dealing with Unicode in #29249 (68fc21b3784aa34c7ba5515ab02ef0c7b6ee856d) I think we should use the same approach here and useensure_ascii=Falsefor any usage ofjson.dumpsinField.value_to_stringfor fields that might include text which includesArrayField, andHStoreField.I don't think an additional field option to control this behavior and certainly not a setting is warranted here as it should be possible to subclass either field class to override
value_to_stringandensure_ascii=Falsedoes constitute a more coherent default.