Opened 10 years ago

Closed 10 years ago

#24500 closed Bug (fixed)

Django runtests - 3 tests fail on windows due to encoding troubles

Reported by: pascal chambon Owned by: nobody
Component: Internationalization Version: 1.8rc1
Severity: Release blocker Keywords:
Cc: Claude Paroz Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by pascal chambon)

On a my python2.7.3 install (32 bits), a today's checkout of django's master doesn't pass all tests - three i18n tests fails due to UnicodeDecodeError (I'm on windows 7 64 bits).

I'm aware it might be due to locales (https://docs.djangoproject.com/en/dev/internals/contributing/writing-code/unit-tests/#many-test-failures-with-unicodeencodeerror), but windows doesn't have such packages that I'm aware of.

My win32 "xgettext --version" is "xgettext (GNU gettext-tools) 0.17"

I've followed the bug down to the xgettext invocation, where it seems the test expects utf8 output, whereas my xgettext-win32 version of course outputs ANSI (french locale) text.

UnicodeDecodeError('utf8', "xgettext (GNU gettext-tools) 0.17\nCopyright (C) 1995-1998, 2000-2007 Free Software Foundation, Inc.\nLicence GPLv3+ : GNU GPL version 3 ou ult\xe9rieure <http://gnu.org/licenses/gpl.html>\nCeci est un logiciel libre : vous pouvez le modifier et le redistribuer.\nIl n'y a PAS DE GARANTIE, dans la mesure de ce que permet la loi.\n\xc9crit par Ulrich
Drepper.\n", 141, 142, 'invalid continuation byte')

Are these tests supposed to pass on windows as well as on *nix systems ? Are there specific requirements regarding xgettext or third-party packages ?

examining files with the extensions: .js
ignoring file code.sample in .
ignoring file not_utf8.sample in .
ignoring file __init__.py in .
ignoring file ignored.html in .\ignore_dir
ignoring file media_ignored.html in .\media_root
ignoring file static_ignored.html in .\static
ignoring file comments.thtml in .\templates
ignoring file empty.html in .\templates
ignoring file plural.djtpl in .\templates
ignoring file template_with_error.tpl in .\templates
ignoring file test.html in .\templates
ignoring file xxx_ignored.html in .\templates
ignoring file ignored.html in .\templates\subdir
processing file javascript.js in .
UnicodeDecodeError: skipped file javascript.js in .
processing file javascript.js in .\someapp\static
UnicodeDecodeError: skipped file javascript.js in .\someapp\static
processing file javascript_ignored.js in .\static
UnicodeDecodeError: skipped file javascript_ignored.js in .\static
processing locale de


======================================================================
FAIL: test_default_root_settings (i18n.test_extraction.JavascriptExtractorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "P:\Websites\django\django\test\utils.py", line 180, in inner
    return test_func(*args, **kwargs)
  File "P:\Websites\django\tests\i18n\test_extraction.py", line 454, in test_default_root_settings
    _, po_contents = self._run_makemessages(domain='djangojs')
  File "P:\Websites\django\tests\i18n\test_extraction.py", line 67, in _run_makemessages
    self.assertTrue(os.path.exists(self.PO_FILE))
AssertionError: False is not true

======================================================================
FAIL: test_javascript_literals (i18n.test_extraction.JavascriptExtractorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "P:\Websites\django\tests\i18n\test_extraction.py", line 422, in test_javascript_literals
    _, po_contents = self._run_makemessages(domain='djangojs')
  File "P:\Websites\django\tests\i18n\test_extraction.py", line 67, in _run_makemessages
    self.assertTrue(os.path.exists(self.PO_FILE))
AssertionError: False is not true

======================================================================
FAIL: test_media_static_dirs_ignored (i18n.test_extraction.JavascriptExtractorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "P:\Websites\django\django\test\utils.py", line 180, in inner
    return test_func(*args, **kwargs)
  File "P:\Websites\django\tests\i18n\test_extraction.py", line 445, in test_media_static_dirs_ignored
    _, po_contents = self._run_makemessages(domain='djangojs')
  File "P:\Websites\django\tests\i18n\test_extraction.py", line 67, in _run_makemessages
    self.assertTrue(os.path.exists(self.PO_FILE))
AssertionError: False is not true

Attachments (1)

0001-Fix-gettext-tools-output-encoding-troubles.patch (3.3 KB ) - added by pascal chambon 10 years ago.

Download all attachments as: .zip

Change History (24)

comment:1 by pascal chambon, 10 years ago

Description: modified (diff)

comment:2 by Tim Graham, 10 years ago

They pass for me on Python 2.7.9 or Python 3.4.2 on Windows Vista 32-bit, xgettext 0.17, Git bash shell.

comment:3 by pascal chambon, 10 years ago

Weird, I updated to python 2.7.9 and tried with cmd.exe or git bash, still the same problem.

I continued investigating, problems occur when django tries to lookup xgettext version.

In gettext_popen_wrapper(), the output of django.core.management.utils.popen_wrapper is expected by django to be utf8 bytes, on python2 :

if six.PY2:
        stdout = stdout.decode('utf-8')

However, from what I see in the initial popen_wrapper(), there are no reasons for stdout to be unicode on python2, only stderr gets converted, and Popen (AFAIK) outputs bytes (in cp1252 encoding, in that case).

def  popen_wrapper(args, os_err_exc_type=CommandError):
    """
    Friendly wrapper around Popen.

    Returns stdout output, stderr output and OS status code.
    """
    print "USING ENCODING >>>>>", DEFAULT_LOCALE_ENCODING   # outputs "cp1252"
    try:
        p = Popen(args, shell=False, stdout=PIPE, stderr=PIPE,
                close_fds=os.name != 'nt', universal_newlines=True)
    except OSError as e:
        strerror = force_text(e.strerror, DEFAULT_LOCALE_ENCODING,
                              strings_only=True)
        six.reraise(os_err_exc_type, os_err_exc_type('Error executing %s: %s' %
                    (args[0], strerror)), sys.exc_info()[2])
    output, errors = p.communicate()
    return (
        output,
        force_text(errors, DEFAULT_LOCALE_ENCODING, strings_only=True),
        p.returncode
    )

This is a mystery to me... could anybody check the intermediate values of this "xgettext -V" stdout, as well as the system encodings (locale.getdefaultlocale() and stuffs), on a machine which doesn't fail these tests ?

comment:4 by Tim Graham, 10 years ago

>>> locale.getdefaultlocale()
('en_US', 'cp1252')

$ xgettext -V
xgettext.exe (GNU gettext-tools) 0.17
Copyright (C) 1995-1998, 2000-2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Written by Ulrich Drepper.

comment:5 by pascal chambon, 10 years ago

Thanks

I guess that's why -> your "xgettext -V" output is all-english, so ascii-compatible, hence that it be treated as utf8, a windows codepage, or ascii, is the same for django, and no error occurs. But for non-ascii outputs like my french "xgettext -V", decoding breaks because the output of Popen is cp1252 with accentuated letters, not utf8.

So we should replace "stdout = stdout.decode('utf-8')" by a proper operation with DEFAULT_LOCALE_ENCODING. I don't know what's best - normalize everything at popen_wrapper level (but it might break lots of code that rely on raw DEFAULT_LOCALE_ENCODING bytes), or handle stuffs properly in the new gettext_popen_wrapper() ? The latter i guess ?

Note that django1.7 didn't have these things, so no tests broke.

When was know the way to go, I can draft a patch to solve that non-ascii-encoding buglet.

comment:6 by Tim Graham, 10 years ago

Cc: Claude Paroz added
Triage Stage: UnreviewedAccepted
Version: 1.7master

Claude, could you advise on the best implementation?

comment:7 by Claude Paroz, 10 years ago

Yes, let's fix this at gettext_popen_wrapper level for now.

comment:8 by pascal chambon, 10 years ago

Hum I tried different ways of catching decoding exceptions to provide a fallback, but it broke more tests elsewhere....

In the end, I just special-cased the "xgettest -V" call, which doesn't respect utf-8 encoding, and let the rest encode/decode as is (CF attached patch). Tested on windows 7, on latest python2 and python3 interpreters.

In the patch I also added some encoding details to an error message (to help, since encoding troubles are recurring), but maybe it's too much information, I don't know.

I'm amazed, though, that xgettext, which is supposed to deal with internationalization, doesn't provide parameters to controls its output encoding, doesn't say anything about it in its docs (afaik), and has different stdout encodings in its different commands...

comment:9 by Claude Paroz, 10 years ago

What about the try/except approach, something like:

try:
    stdout = stdout.decode('utf-8')
except UnicodeDecodeError:
    stdout = stdout.decode(preferred_encoding)

comment:10 by pascal chambon, 10 years ago

Yes I tried it, but it broke other assumptions (especially tests dedicated to verify real file encoding troubles).

Furthermore, I'm afraid it might lead to new cases of mojibake, since an ANSI (ex. latin1) string might, by error, be treated as an utf8 string (and thus unexpected unicode characters be formed by grouping together 2 our more single-byte characters). Most of the time we get "invalid continuation byte" in such cases, but if one has bad luck....

I guess our best chance is to bet on empirical evidence, and assume that xgettext tools always return utf-8 strings, unless when called for side tasks like displaying version info. As long as tests pass on both *nix and windows, python2 and python3, on non-english computers, i'm rather confident about the robustness of this solution (test coverage is pretty strong on these i18n encoding troubles).

comment:11 by Claude Paroz, 10 years ago

Has patch: set

comment:12 by Tim Graham, 10 years ago

Are you able to create a GitHub pull request?

comment:13 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

PR; tests seem fine on CI and my Windows setup. Claude does the code look fine?

comment:14 by Claude Paroz, 10 years ago

Apart from a minor note on the PR, it's fine for me.

comment:15 by Tim Graham <timograham@…>, 10 years ago

Resolution: fixed
Status: newclosed

In 53c2cf1e:

Fixed #24500 -- Fixed makemessages encoding problems retrieving gettext version.

comment:16 by Tim Graham <timograham@…>, 10 years ago

In dc2cff5:

[1.8.x] Fixed #24500 -- Fixed makemessages encoding problems retrieving gettext version.

Backport of 53c2cf1e7b3f1c9a7794fcc5c4669145f351b2fa from master

comment:17 by Claude Paroz, 10 years ago

Has patch: unset
Resolution: fixed
Severity: NormalRelease blocker
Status: closednew
Triage Stage: Ready for checkinAccepted
Version: master1.8rc1

Now those tests are failing on my system :-/ I'll try to debug this issue.

comment:18 by Claude Paroz, 10 years ago

Has patch: set

PR https://github.com/django/django/pull/4425
locale.getpreferredencoding(False) is not usable. For example, it returns ANSI_X3.4-1968 on my system!
The patch needs to be tested on Windows now.

comment:19 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

Patch works on my Windows. On the other hand, I didn't have a problem before this ticket.

comment:20 by Claude Paroz, 10 years ago

You didn't have a problem before because you have an English locale, and the gettext response doesn't include any non-ascii character for you.

comment:21 by Claude Paroz <claude@…>, 10 years ago

In 3f4e778:

Refs #24500 -- Avoided locale.getpreferredencoding in makemessages

Fixes a regression introduced in 53c2cf1e.

comment:22 by Claude Paroz <claude@…>, 10 years ago

In 72de42cd:

[1.8.x] Refs #24500 -- Avoided locale.getpreferredencoding in makemessages

Fixes a regression introduced in 53c2cf1e.
Backport of 3f4e77840 from master.

comment:23 by Claude Paroz, 10 years ago

Resolution: fixed
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top