#22399 closed Bug (fixed)
loaddata doesn't work correctly when importing utf-8 encoded files
Reported by: | bacilla | Owned by: | nobody |
---|---|---|---|
Component: | Core (Management commands) | Version: | 1.6 |
Severity: | Normal | Keywords: | loaddata utf-8 python3 |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Environment: Windows 7, Python 3.3, Django 1.6.2, PyYAML 3.11
When initializing DB with a yaml fixture that contains russian characters, like this:
- model: testapp.City fields: name: Санкт-Петербург
or unicode escaped sequences, like this:
- model: testapp.City fields: name: "\u040c\u00ae\u045e\u00ae\u0431\u0401\u040e\u0401\u0430\u0431\u0404"
in a 'name' column appears garbage.
It seems that this happens because a fixture file doesn't properly opened in utf-8 encoding, line 122 of the source file 'django/core/management/commands/loaddata.py' (missing parameter 'encoding="utf-8"').
Python discussions there:
https://mail.python.org/pipermail/python-ideas/2013-June/021230.html
Attachments (1)
Change History (11)
by , 11 years ago
Attachment: | testproj.tar.gz added |
---|
comment:1 by , 11 years ago
Keywords: | python3 added |
---|---|
Triage Stage: | Unreviewed → Accepted |
encoding="utf-8"
is a Python 3 addition to the open()
method (that only makes sense when reading the file in text mode).
I think that for best compatibility with other open methods (gzip, zip, bzip), it would be easier to simply force opening the file in binary mode ('rb'), then the deserializing step should automatically care for decoding the file in 'utf-8'. Could you test if using fixture = open_method(fixture_file, 'rb')
is solving your issue?
comment:2 by , 11 years ago
Triage Stage: | Accepted → Unreviewed |
---|
This fixes the first case (characters in the yaml file), but doesn't fixes second (unicode escaped sequences).
comment:3 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:4 by , 11 years ago
As for the escaped sequence, what are you expecting? If I'm looking at your proposed sequence, the result is really "Ќ®ў®бЁЎЁабЄ"... (\u040c = Ќ, \u00ae = ®, etc.)
comment:5 by , 11 years ago
Oh you're right it is my fault. Right string is '\u041d\u043e\u0432\u043e\u0441\u0438\u0431\u0438\u0440\u0441\u043a' and it works perfectly.
comment:6 by , 11 years ago
OK, then there is an obvious fix, always reading in binary mode:
diff --git a/django/core/management/commands/loaddata.py b/django/core/management/commands/loaddata.py index 44583bd..44946fe 100644 --- a/django/core/management/commands/loaddata.py +++ b/django/core/management/commands/loaddata.py @@ -125,7 +125,7 @@ class Command(BaseCommand): for fixture_file, fixture_dir, fixture_name in self.find_fixtures(fixture_label): _, ser_fmt, cmp_fmt = self.parse_name(os.path.basename(fixture_file)) open_method = self.compression_formats[cmp_fmt] - fixture = open_method(fixture_file, 'r') + fixture = open_method(fixture_file, 'rb') try: self.fixture_count += 1 objects_in_fixture = 0
Or a more elaborate patch that try to take advantage of reading in text mode on Python 3:
diff --git a/django/core/management/commands/loaddata.py b/django/core/management/commands/loaddata.py index 44583bd..5938770 100644 --- a/django/core/management/commands/loaddata.py +++ b/django/core/management/commands/loaddata.py @@ -14,7 +14,7 @@ from django.core.management.base import BaseCommand, CommandError from django.core.management.color import no_style from django.db import (connections, router, transaction, DEFAULT_DB_ALIAS, IntegrityError, DatabaseError) -from django.utils import lru_cache +from django.utils import lru_cache, six from django.utils.encoding import force_text from django.utils.functional import cached_property from django.utils._os import upath @@ -76,13 +76,14 @@ class Command(BaseCommand): self.models = set() self.serialization_formats = serializers.get_public_serializer_formats() + kwargs = {'encoding': 'utf-8'} if six.PY3 else {} self.compression_formats = { - None: open, - 'gz': gzip.GzipFile, - 'zip': SingleZipReader + None: (open, kwargs), + 'gz': (gzip.GzipFile, kwargs), + 'zip': (SingleZipReader, {}), } if has_bz2: - self.compression_formats['bz2'] = bz2.BZ2File + self.compression_formats['bz2'] = (bz2.BZ2File, kwargs) with connection.constraint_checks_disabled(): for fixture_label in fixture_labels: @@ -124,8 +125,8 @@ class Command(BaseCommand): """ for fixture_file, fixture_dir, fixture_name in self.find_fixtures(fixture_label): _, ser_fmt, cmp_fmt = self.parse_name(os.path.basename(fixture_file)) - open_method = self.compression_formats[cmp_fmt] - fixture = open_method(fixture_file, 'r') + open_method, kwargs = self.compression_formats[cmp_fmt] + fixture = open_method(fixture_file, 'rb', **kwargs) try: self.fixture_count += 1 objects_in_fixture = 0
comment:7 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
sample project