When dumping data from a UTF-8-encoded database, get_string_value() in core/serializers/base.by casts CharField/TextField values with str(), which results in a decode error when encountering non-ASCII characters. The settings.DEFAULT_CHARSET value and the 'encoding' option are ignored.
Setup to reproduce:
PostgreSQL 8.1.8
Python 2.4.4
Django SVN revision 5152
create new project and app for testing (names used in this example are 'myproject' and 'myapp')
cat myapp/models.py :
from django.db import models
class Thingy(models.Model):
my_column = models.CharField(maxlength=255)
python manage.py --plain shell
>>> from myproject.myapp import models
>>> thing = models.Thingy(my_column='äöü') # non-ASCII string entered as literal here, locale is set to UTF-8
>>> thing.save()
>>> thing.my_column
'\xc3\xa4\xc3\xb6\xc3\xbc' # UTF-8 string
python manage.py dumpdata --format=xml myapp
Unable to serialize database: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
Using the JSON format to dump works, but then fails with a similar encoding error on 'loaddata'.
A partial fix is attached as patch (this now works for me with XML-serialization).