Opened 10 years ago
Closed 10 years ago
#24500 closed Bug (fixed)
Django runtests - 3 tests fail on windows due to encoding troubles
Reported by: | pascal chambon | Owned by: | nobody |
---|---|---|---|
Component: | Internationalization | Version: | 1.8rc1 |
Severity: | Release blocker | Keywords: | |
Cc: | Claude Paroz | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
On a my python2.7.3 install (32 bits), a today's checkout of django's master doesn't pass all tests - three i18n tests fails due to UnicodeDecodeError (I'm on windows 7 64 bits).
I'm aware it might be due to locales (https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/unit-tests/#many-test-failures-with-unicodeencodeerror), but windows doesn't have such packages that I'm aware of.
My win32 "xgettext --version" is "xgettext (GNU gettext-tools) 0.17"
I've followed the bug down to the xgettext invocation, where it seems the test expects utf8 output, whereas my xgettext-win32 version of course outputs ANSI (french locale) text.
UnicodeDecodeError('utf8', "xgettext (GNU gettext-tools) 0.17\nCopyright (C) 1995-1998, 2000-2007 Free Software Foundation, Inc.\nLicence GPLv3+ : GNU GPL version 3 ou ult\xe9rieure <http://gnu.org/licenses/gpl.html>\nCeci est un logiciel libre : vous pouvez le modifier et le redistribuer.\nIl n'y a PAS DE GARANTIE, dans la mesure de ce que permet la loi.\n\xc9crit par Ulrich Drepper.\n", 141, 142, 'invalid continuation byte')
Are these tests supposed to pass on windows as well as on *nix systems ? Are there specific requirements regarding xgettext or third-party packages ?
examining files with the extensions: .js ignoring file code.sample in . ignoring file not_utf8.sample in . ignoring file __init__.py in . ignoring file ignored.html in .\ignore_dir ignoring file media_ignored.html in .\media_root ignoring file static_ignored.html in .\static ignoring file comments.thtml in .\templates ignoring file empty.html in .\templates ignoring file plural.djtpl in .\templates ignoring file template_with_error.tpl in .\templates ignoring file test.html in .\templates ignoring file xxx_ignored.html in .\templates ignoring file ignored.html in .\templates\subdir processing file javascript.js in . UnicodeDecodeError: skipped file javascript.js in . processing file javascript.js in .\someapp\static UnicodeDecodeError: skipped file javascript.js in .\someapp\static processing file javascript_ignored.js in .\static UnicodeDecodeError: skipped file javascript_ignored.js in .\static processing locale de ====================================================================== FAIL: test_default_root_settings (i18n.test_extraction.JavascriptExtractorTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "P:\Websites\django\django\test\utils.py", line 180, in inner return test_func(*args, **kwargs) File "P:\Websites\django\tests\i18n\test_extraction.py", line 454, in test_default_root_settings _, po_contents = self._run_makemessages(domain='djangojs') File "P:\Websites\django\tests\i18n\test_extraction.py", line 67, in _run_makemessages self.assertTrue(os.path.exists(self.PO_FILE)) AssertionError: False is not true ====================================================================== FAIL: test_javascript_literals (i18n.test_extraction.JavascriptExtractorTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "P:\Websites\django\tests\i18n\test_extraction.py", line 422, in test_javascript_literals _, po_contents = self._run_makemessages(domain='djangojs') File "P:\Websites\django\tests\i18n\test_extraction.py", line 67, in _run_makemessages self.assertTrue(os.path.exists(self.PO_FILE)) AssertionError: False is not true ====================================================================== FAIL: test_media_static_dirs_ignored (i18n.test_extraction.JavascriptExtractorTests) ---------------------------------------------------------------------- Traceback (most recent call last): File "P:\Websites\django\django\test\utils.py", line 180, in inner return test_func(*args, **kwargs) File "P:\Websites\django\tests\i18n\test_extraction.py", line 445, in test_media_static_dirs_ignored _, po_contents = self._run_makemessages(domain='djangojs') File "P:\Websites\django\tests\i18n\test_extraction.py", line 67, in _run_makemessages self.assertTrue(os.path.exists(self.PO_FILE)) AssertionError: False is not true
Attachments (1)
Change History (24)
comment:1 by , 10 years ago
Description: | modified (diff) |
---|
comment:2 by , 10 years ago
comment:3 by , 10 years ago
Weird, I updated to python 2.7.9 and tried with cmd.exe or git bash, still the same problem.
I continued investigating, problems occur when django tries to lookup xgettext version.
In gettext_popen_wrapper(), the output of django.core.management.utils.popen_wrapper is expected by django to be utf8 bytes, on python2 :
if six.PY2: stdout = stdout.decode('utf-8')
However, from what I see in the initial popen_wrapper(), there are no reasons for stdout to be unicode on python2, only stderr gets converted, and Popen (AFAIK) outputs bytes (in cp1252 encoding, in that case).
def popen_wrapper(args, os_err_exc_type=CommandError): """ Friendly wrapper around Popen. Returns stdout output, stderr output and OS status code. """ print "USING ENCODING >>>>>", DEFAULT_LOCALE_ENCODING # outputs "cp1252" try: p = Popen(args, shell=False, stdout=PIPE, stderr=PIPE, close_fds=os.name != 'nt', universal_newlines=True) except OSError as e: strerror = force_text(e.strerror, DEFAULT_LOCALE_ENCODING, strings_only=True) six.reraise(os_err_exc_type, os_err_exc_type('Error executing %s: %s' % (args[0], strerror)), sys.exc_info()[2]) output, errors = p.communicate() return ( output, force_text(errors, DEFAULT_LOCALE_ENCODING, strings_only=True), p.returncode )
This is a mystery to me... could anybody check the intermediate values of this "xgettext -V" stdout, as well as the system encodings (locale.getdefaultlocale() and stuffs), on a machine which doesn't fail these tests ?
comment:4 by , 10 years ago
>>> locale.getdefaultlocale() ('en_US', 'cp1252') $ xgettext -V xgettext.exe (GNU gettext-tools) 0.17 Copyright (C) 1995-1998, 2000-2007 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Ulrich Drepper.
comment:5 by , 10 years ago
Thanks
I guess that's why -> your "xgettext -V" output is all-english, so ascii-compatible, hence that it be treated as utf8, a windows codepage, or ascii, is the same for django, and no error occurs. But for non-ascii outputs like my french "xgettext -V", decoding breaks because the output of Popen is cp1252 with accentuated letters, not utf8.
So we should replace "stdout = stdout.decode('utf-8')" by a proper operation with DEFAULT_LOCALE_ENCODING. I don't know what's best - normalize everything at popen_wrapper level (but it might break lots of code that rely on raw DEFAULT_LOCALE_ENCODING bytes), or handle stuffs properly in the new gettext_popen_wrapper() ? The latter i guess ?
Note that django1.7 didn't have these things, so no tests broke.
When was know the way to go, I can draft a patch to solve that non-ascii-encoding buglet.
comment:6 by , 10 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Accepted |
Version: | 1.7 → master |
Claude, could you advise on the best implementation?
by , 10 years ago
Attachment: | 0001-Fix-gettext-tools-output-encoding-troubles.patch added |
---|
comment:8 by , 10 years ago
Hum I tried different ways of catching decoding exceptions to provide a fallback, but it broke more tests elsewhere....
In the end, I just special-cased the "xgettest -V" call, which doesn't respect utf-8 encoding, and let the rest encode/decode as is (CF attached patch). Tested on windows 7, on latest python2 and python3 interpreters.
In the patch I also added some encoding details to an error message (to help, since encoding troubles are recurring), but maybe it's too much information, I don't know.
I'm amazed, though, that xgettext, which is supposed to deal with internationalization, doesn't provide parameters to controls its output encoding, doesn't say anything about it in its docs (afaik), and has different stdout encodings in its different commands...
comment:9 by , 10 years ago
What about the try/except approach, something like:
try: stdout = stdout.decode('utf-8') except UnicodeDecodeError: stdout = stdout.decode(preferred_encoding)
comment:10 by , 10 years ago
Yes I tried it, but it broke other assumptions (especially tests dedicated to verify real file encoding troubles).
Furthermore, I'm afraid it might lead to new cases of mojibake, since an ANSI (ex. latin1) string might, by error, be treated as an utf8 string (and thus unexpected unicode characters be formed by grouping together 2 our more single-byte characters). Most of the time we get "invalid continuation byte" in such cases, but if one has bad luck....
I guess our best chance is to bet on empirical evidence, and assume that xgettext tools always return utf-8 strings, unless when called for side tasks like displaying version info. As long as tests pass on both *nix and windows, python2 and python3, on non-english computers, i'm rather confident about the robustness of this solution (test coverage is pretty strong on these i18n encoding troubles).
comment:11 by , 10 years ago
Has patch: | set |
---|
comment:13 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
PR; tests seem fine on CI and my Windows setup. Claude does the code look fine?
comment:17 by , 10 years ago
Has patch: | unset |
---|---|
Resolution: | fixed |
Severity: | Normal → Release blocker |
Status: | closed → new |
Triage Stage: | Ready for checkin → Accepted |
Version: | master → 1.8rc1 |
Now those tests are failing on my system :-/ I'll try to debug this issue.
comment:18 by , 10 years ago
Has patch: | set |
---|
PR https://github.com/django/django/pull/4425
locale.getpreferredencoding(False)
is not usable. For example, it returns ANSI_X3.4-1968
on my system!
The patch needs to be tested on Windows now.
comment:19 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Patch works on my Windows. On the other hand, I didn't have a problem before this ticket.
comment:20 by , 10 years ago
You didn't have a problem before because you have an English locale, and the gettext response doesn't include any non-ascii character for you.
comment:23 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
They pass for me on Python 2.7.9 or Python 3.4.2 on Windows Vista 32-bit, xgettext 0.17, Git bash shell.