Django

Code

Ticket #4227 (closed: fixed)

Opened 1 year ago

Last modified 1 year ago

dumpdata/loaddata serializer ignores encoding settings

Reported by: Caspar Hasenclever <hasencle@informatik.uni-freiburg.de> Assigned to: adrian
Milestone: Component: Core framework
Version: SVN Keywords: unicode-branch
Cc: Triage Stage: Accepted
Has patch: 0 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

When dumping data from a UTF-8-encoded database, get_string_value() in core/serializers/base.by casts CharField/TextField values with str(), which results in a decode error when encountering non-ASCII characters. The settings.DEFAULT_CHARSET value and the 'encoding' option are ignored.

Setup to reproduce:
PostgreSQL 8.1.8
Python 2.4.4
Django SVN revision 5152

create new project and app for testing (names used in this example are 'myproject' and 'myapp')

cat myapp/models.py :

from django.db import models

class Thingy(models.Model):
    my_column = models.CharField(maxlength=255)

python manage.py --plain shell

>>> from myproject.myapp import models 
>>> thing = models.Thingy(my_column='äöü') # non-ASCII string entered as literal here, locale is set to UTF-8
>>> thing.save()
>>> thing.my_column
'\xc3\xa4\xc3\xb6\xc3\xbc' # UTF-8 string

python manage.py dumpdata --format=xml myapp

Unable to serialize database: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

Using the JSON format to dump works, but then fails with a similar encoding error on 'loaddata'.

A partial fix is attached as patch (this now works for me with XML-serialization).

Attachments

core_serializers.diff (1.5 kB) - added by Caspar Hasenclever <hasencle@informatik.uni-freiburg.de> on 05/05/07 17:53:47.
output of svn diff in trunk/django/core/serializers/

Change History

05/05/07 17:53:47 changed by Caspar Hasenclever <hasencle@informatik.uni-freiburg.de>

  • attachment core_serializers.diff added.

output of svn diff in trunk/django/core/serializers/

05/05/07 17:58:38 changed by Caspar Hasenclever <hasencle@informatik.uni-freiburg.de>

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

I forgot to mention that the DATABASE_ENGINE is 'postgresql_psycopg2'.

05/06/07 20:43:33 changed by anonymous

  • has_patch deleted.
  • summary changed from dumpdata/loaddata serializer ignores encoding settings to [unicode] dumpdata/loaddata serializer ignores encoding settings.
  • stage changed from Unreviewed to Accepted.

This is being fixed in a slightly different way in the unicode branch. We won't apply this patch to trunk, since all those problems are being fixed on the branch in a unified fashion. However, I'll leave the ticket open until the branch is merged to give us a double-check that we fix all reported problems.

05/15/07 11:14:56 changed by mtredinnick

(In [5248]) unicode: Made the serializers unicode-aware. Refs #3878, #4227.

05/15/07 11:17:20 changed by mtredinnick

  • keywords set to unicode-branch.
  • summary changed from [unicode] dumpdata/loaddata serializer ignores encoding settings to dumpdata/loaddata serializer ignores encoding settings.

This was fixed in the unicode branch in [5248]. I'll close this ticket when the branch is merged back into trunk.

05/21/07 20:38:31 changed by ross@rossp.org

As much as it's basically irrelevant - FWIW this patch worked perfectly on rev 4992 - it let me successfully export data from MySQL in xml format whereas previously only json worked (but on the other end - where I'm loading data into a postgres database - json was giving invalid data types on boolean fields)

Thank you, Caspar!

07/04/07 07:11:05 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [5609]) Merged Unicode branch into trunk (r4952:5608). This should be fully backwards compatible for all practical purposes.

Fixed #2391, #2489, #2996, #3322, #3344, #3370, #3406, #3432, #3454, #3492, #3582, #3690, #3878, #3891, #3937, #4039, #4141, #4227, #4286, #4291, #4300, #4452, #4702


Add/Change #4227 (dumpdata/loaddata serializer ignores encoding settings)




Change Properties
Action