#19357 closed Bug (fixed)
Unicode errors when Django is installed under a non-ASCII path
Reported by: | kujiu | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Python 2 | Version: | dev |
Severity: | Release blocker | Keywords: | |
Cc: | ua_django_bugzilla@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
With Python 2.7, when using the unicode_literals import, all strings in the current files are interpreted as unicode. But, all strings coming from an other file that doesn't contain this import are str. An UnicodeDecodeError occurs when concatenating directly strings from a file with unicode_literals and an other without. But all works fine with Python 3.x.
If we start django with django installed on a path containing unicode character (like asian character), a UnicodeDecodeError occurs on file django/utils/translation/trans_real.py (line 112). If an app have a path with unicode character, the error occurs on the same file at line 155. Each time, the traceback stops in posixpath.py (in python lib), with the command "path += '/' + b". Django doesn't start, tests don't start, shell doesn't start.
Some files in tests have the same issue like :
- django/contrib/formtools/tests/init.py
- django/utils/translation/trans_real.py tests/regressiontests/fixtures_regress/tests.py
- tests/regressiontests/i18n/contenttypes/tests.py tests/regressiontests/i18n/patterns/tests.py
- tests/regressiontests/i18n/tests.py tests/regressiontests/servers/tests.py
- tests/regressiontests/staticfiles_tests/tests.py tests/regressiontests/templates/response.py
- tests/regressiontests/test_client_regress/tests.py
Example of traceback (with "runtests.py --settings=test_sqlite"):
Traceback (most recent call last): File "./runtests.py", line 327, in <module> options.failfast, args) File "./runtests.py", line 170, in django_tests failures = test_runner.run_tests(test_labels, extra_tests=extra_tests) File "/home/kujiu/仕事/labs/projects/django-trunk/django/test/simple.py", line 366, in run_tests suite = self.build_suite(test_labels, extra_tests) File "/home/kujiu/仕事/labs/projects/django-trunk/django/test/simple.py", line 260, in build_suite suite.addTest(build_suite(app)) File "/home/kujiu/仕事/labs/projects/django-trunk/django/test/simple.py", line 68, in build_suite test_module = get_tests(app_module) File "/home/kujiu/仕事/labs/projects/django-trunk/django/test/simple.py", line 25, in get_tests test_module = import_module('.'.join(prefix + [TEST_MODULE])) File "/home/kujiu/仕事/labs/projects/django-trunk/django/utils/importlib.py", line 35, in import_module __import__(name) File "/home/kujiu/仕事/labs/projects/django-trunk/django/contrib/flatpages/tests/__init__.py", line 2, in <mo dule> from django.contrib.flatpages.tests.forms import * File "/home/kujiu/仕事/labs/projects/django-trunk/django/contrib/flatpages/tests/forms.py", line 4, in <modul e> from django.contrib.flatpages.forms import FlatpageForm File "/home/kujiu/仕事/labs/projects/django-trunk/django/contrib/flatpages/forms.py", line 6, in <module> class FlatpageForm(forms.ModelForm): File "/home/kujiu/仕事/labs/projects/django-trunk/django/contrib/flatpages/forms.py", line 10, in FlatpageForm error_message = _("This value must contain only letters, numbers," File "/home/kujiu/仕事/labs/projects/django-trunk/django/forms/fields.py", line 441, in __init__ if error_message: File "/home/kujiu/仕事/labs/projects/django-trunk/django/utils/functional.py", line 117, in __wrapper__ res = func(*self.__args, **self.__kw) File "/home/kujiu/仕事/labs/projects/django-trunk/django/utils/translation/__init__.py", line 71, in ugettext return _trans.ugettext(message) File "/home/kujiu/仕事/labs/projects/django-trunk/django/utils/translation/trans_real.py", line 275, in ugett ext return do_translate(message, 'ugettext') File "/home/kujiu/仕事/labs/projects/django-trunk/django/utils/translation/trans_real.py", line 257, in do_tr anslate _default = translation(settings.LANGUAGE_CODE) File "/home/kujiu/仕事/labs/projects/django-trunk/django/utils/translation/trans_real.py", line 112, in trans lation globalpath = os.path.join(os.path.dirname(sys.modules[settings.__module__].__file__), 'locale') File "/usr/lib/python2.7/posixpath.py", line 80, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 12: ordinal not in range(128)
Attachments (3)
Change History (24)
comment:1 by , 12 years ago
Severity: | Normal → Release blocker |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 12 years ago
Owner: | changed from | to
---|
comment:3 by , 12 years ago
Indeed, it appears that this regression was introduced by the unicode literals patch.
The fix may be as simple as using str('locale')
on line 112 of django/utils/translation/trans_real.py
.
We should audit for other instances of this problem.
comment:4 by , 12 years ago
Summary: | Unicode errors with mixing files with unicode_literals and others → Unicode errors when Django is installed under a non-ASCII path |
---|
comment:5 by , 12 years ago
In fact, when the unicode_literals path was applied, filesystem paths operations weren't changed accordingly. Python uses native strings for filesystem paths.
In files that have unicode_literals turned on, every string involved in a path operation must be wrapped with str()
or force_str()
. I'm currently at 1000 lines of diff, and counting...
comment:6 by , 12 years ago
Wait, what? It ought to be OK to pass unicode on Python 2.X to Python os path functions. Yes, internally Python is going to have to turn unicode into a bytestring for passing into the system functions...but it should be able to do that, we should not have to do it before calling the os path functions. We've gone back and forth on this before, see for example #11030. Django code CANNOT be responsible for figuring out what the right encoding is for file system paths -- utf8 is going to be wrong on Asian windows systems, for example. Underlying Python should be configured correctly to figure that out. Or am I misunderstanding what you are having to do here....?
comment:7 by , 12 years ago
As discussed on IRC, I'm just wrapping ASCII string literals in str(...)
to obtain native strings where necessary, under Python 2 and 3, in modules that have unicode_literals turned on. I'm not guessing encodings.
by , 12 years ago
Attachment: | 19357-wip.patch added |
---|
comment:8 by , 12 years ago
With the patch I just attached, the tests suite result is: FAILED (failures=5, errors=85, skipped=181, expected failures=5)
when the checkout is under a non-ASCII path (ie. mv django djángò
).
comment:9 by , 12 years ago
I forgot to mention that when we eventually deprecate support for Python 2, it should be easy to remove most superfluous str()
calls with a custom 2to3 fixer that rewrites str(<STRING LITERAL>)
to <STRING LITERAL>
.
by , 12 years ago
Attachment: | 19357-wip-2.patch added |
---|
comment:11 by , 12 years ago
The patch I just attached requires applying the pull request for #19398 first. Clocking at over 1500 lines, it improves the test results under Python 2.6 + SQLite to FAILED (failures=2, errors=9, skipped=187, expected failures=7)
. Remaining failures are in the forms
, templates
, urlpatterns_reverse
, and file_storage
tests.
If we go ahead with this amount of changes in the 1.5 branch, it would be more reasonable to make a second beta :(
I didn't remove the unicode_literals
import, even in modules where many strings are wrapped in str(...)
, because I don't have a good rule to chose which modules should have unicode_literals
and which shouldn't.
comment:12 by , 12 years ago
As discussed on IRC, the attached patch tries to convert all __file__
occurrences to unicode. As for me, I think this is less invasive, but testing required!
comment:13 by , 12 years ago
Here's another random example (file /tmp/café.txt
exists on disk):
Django 1.4.2
>>> from django.http import HttpRequest >>> from django.views.static import serve >>> serve(HttpRequest(), 'café.txt', '/tmp') <django.http.HttpResponse object at 0x104ebf310>
master
>>> from django.http import HttpRequest >>> from django.views.static import serve >>> serve(HttpRequest(), 'café.txt', '/tmp') Traceback (most recent call last): File "<console>", line 1, in <module> File "/Users/myk/Documents/dev/django/django/views/static.py", line 38, in serve path = path.lstrip('/') UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
(This happens because unicode_literals is enabled => '/'
is unicode => path.lstrip('/')
attempts to convert path
to unicode.)
We both spent hours working on this, and none of our patches fixes that regression!
From the users' point of view, it's the amount of breakage between 1.4 and 1.5 that matters, more than the amount of changes we have to do between now and 1.5. The real question is — how do we minimize the number of such regressions in modules where unicode_literals is enabled?
I don't have an answer. My instincts tell me that switching all filesystem path handling to unicode is risky — all the more since it happened by accident rather than for a good reason, and since no one anticipated the consequences — but that's nowhere near a scientific argument!
comment:14 by , 12 years ago
Cc: | added |
---|
comment:15 by , 12 years ago
Yes, we will probably have to add some more force_text
calls in several places.
We choosed at some point to go the unicode route (and this was more or less forced by the option to be able to run Django with both Python 2 and Python 3). Now I think we should pursue this option. Frankly, I don't think one path is more risky that another, it's a real "design decision".
comment:16 by , 12 years ago
This mailing-list discussion makes it clear that we should use Claude's technique.
Claude, if you'd like me to proof read your patch, could you create a pull request?
comment:18 by , 12 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
I left a few comments on the pull request, but nothing fundamental.
comment:19 by , 12 years ago
Also, I'm still getting one encoding-related error with the patch on master:
====================================================================== ERROR: test_warnings_capture (regressiontests.logging_tests.tests.WarningLoggerTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/myk/Documents/dev/djángo/django/test/utils.py", line 213, in inner return test_func(*args, **kwargs) File "/Users/myk/Documents/dev/djángo/tests/regressiontests/logging_tests/tests.py", line 157, in test_warnings_capture self.assertTrue('Foo Deprecated' in output.getvalue()) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 27: ordinal not in range(128) ----------------------------------------------------------------------
comment:20 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
This seems like a potential important regression. The OP verified that it worked fine in 1.4 but this UnicodeDecodeError occurs in master.