Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#25677 closed Bug (fixed)

compilemessages throws an exception and does not report msgformat errors correctly

Reported by: Gavin Wahl Owned by: Ramiro Morales
Component: Internationalization Version: dev
Severity: Normal Keywords: 1.10 windows
Cc: Ramiro Morales Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have a django.po with errors. When I run compilemessages, instead of getting a sensible error, I get a UnicodeDecodeError.

Traceback (most recent call last):
  File "/usr/lib/python3.4/pdb.py", line 1661, in main
    pdb._runscript(mainpyfile)
  File "/usr/lib/python3.4/pdb.py", line 1542, in _runscript
    self.run(statement)
  File "/usr/lib/python3.4/bdb.py", line 431, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "manage.py", line 2, in <module>
    import os
  File "django/core/management/__init__.py", line 350, in execute_from_command_line
    utility.execute()
  File "django/core/management/__init__.py", line 342, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "django/core/management/base.py", line 348, in run_from_argv
    self.execute(*args, **cmd_options)
  File "django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "django/core/management/commands/compilemessages.py", line 98, in handle
    self.compile_messages(locations)
  File "django/core/management/commands/compilemessages.py", line 122, in compile_messages
    output, errors, status = popen_wrapper(args)
  File "django/core/management/utils.py", line 27, in popen_wrapper
    output, errors = p.communicate()
  File "/usr/lib/python3.4/subprocess.py", line 962, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.4/subprocess.py", line 1664, in _communicate
    self.stderr.encoding)
  File "/usr/lib/python3.4/subprocess.py", line 888, in _translate_newlines
    data = data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 179: invalid continuation byte

I should get the error from msgformat:

$ msgfmt --check-format -o locale/es/LC_MESSAGES/django.mo locale/es/LC_MESSAGES/django.po
locale/es/LC_MESSAGES/django.po:112: 'msgstr' is not a valid Python brace format string, unlike 'msgid'. Reason: In the directive number 0, '�' cannot start a field name.
msgfmt: found 1 fatal error

The problem is that the output of msgformat is _not_ a utf-8 string. It's bytes. Any attempt to decode it into unicode is futile.

The exact output of msgformat is b"locale/es/LC_MESSAGES/django.po:112: 'msgstr' is not a valid Python brace format string, unlike 'msgid'. Reason: In the directive number 0, '\xc5' cannot start a field name.\nmsgfmt: found 1 fatal error\n".

Attachments (1)

django.po (863 bytes ) - added by Gavin Wahl 8 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 by Claude Paroz, 8 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug
Version: 1.8master

Could you please provide the faulty msgid/msgstr that caused the error?

by Gavin Wahl, 8 years ago

Attachment: django.po added

comment:2 by Claude Paroz, 8 years ago

It's interesting to look at the traceback on Python 3:

...
  File "/home/claude/virtualenvs/djangogit34/lib/python3.4/site-packages/django/core/management/commands/compilemessages.py", line 97, in handle
    self.compile_messages(locations)
  File "/home/claude/virtualenvs/djangogit34/lib/python3.4/site-packages/django/core/management/commands/compilemessages.py", line 121, in compile_messages
    output, errors, status = popen_wrapper(args)
  File "/home/claude/virtualenvs/djangogit34/lib/python3.4/site-packages/django/core/management/utils.py", line 27, in popen_wrapper
    output, errors = p.communicate()
  File "/usr/lib/python3.4/subprocess.py", line 960, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib/python3.4/subprocess.py", line 1662, in _communicate
    self.stderr.encoding)
  File "/usr/lib/python3.4/subprocess.py", line 888, in _translate_newlines
    data = data.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc5 in position 214: invalid continuation byte

We see here that decoding (and error) happens in the standard lib.
Should we blame gettext which outputs non-utf-8-decodable characters? Should we blame Python by not falling back to some human-readable representation of the offending byte? Can Django do something to prevent that?

comment:3 by Gavin Wahl, 8 years ago

The error is in the standard lib because it was passed universal_newlines=True, which works on unicode. Popen will work in bytes otherwise.

comment:4 by Claude Paroz, 8 years ago

Has patch: set

comment:5 by Jef Geskens, 8 years ago

Owner: changed from nobody to Jef Geskens
Status: newassigned

comment:6 by Jef Geskens, 8 years ago

Owner: Jef Geskens removed
Status: assignednew

comment:7 by Tim Graham, 8 years ago

Triage Stage: AcceptedReady for checkin

comment:8 by Claude Paroz <claude@…>, 8 years ago

Owner: set to Claude Paroz <claude@…>
Resolution: fixed
Status: newclosed

In fa08d27f:

Fixed #25677 -- Prevented decoding errors in/after Popen calls

Thanks Gavin Wahl for the report and Tim Graham for the review.

comment:9 by Tim Graham <timograham@…>, 8 years ago

In db8763fb:

Refs #25677 -- Fixed Python 2 i18n test failure on non-ASCII path.

comment:10 by Tim Graham, 8 years ago

Has patch: unset
Resolution: fixed
Status: closednew
Triage Stage: Ready for checkinAccepted

Oops, I haven't installed gettext on the Windows CI machine so the tests aren't running there. Here's a sample error from my local machine which does have gettext installed:

======================================================================
ERROR: test_no_wrap_enabled (i18n.test_extraction.NoWrapExtractorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\Users\Tim\code\django\tests\i18n\test_extraction.py", line 597, in te
st_no_wrap_enabled
  File "c:\users\tim\code\django\django\core\management\__init__.py", line 117,
in call_command
    return command.execute(*args, **defaults)
  File "c:\users\tim\code\django\django\core\management\base.py", line 341, in e
xecute
    output = self.handle(*args, **options)
  File "c:\users\tim\code\django\django\core\management\commands\makemessages.py
", line 307, in handle
    self.write_po_file(potfile, locale)
  File "c:\users\tim\code\django\django\core\management\commands\makemessages.py
", line 539, in write_po_file
    "errors happened while running msgmerge\n%s" % errors)
django.core.management.base.CommandError: errors happened while running msgmerge

c:\Users\Tim\code\django\tests\i18n\commands\locale\de\LC_MESSAGES\django.po:2:4
7: syntax error
msgmerge: found 1 fatal error

comment:11 by Tim Graham, 8 years ago

Claude, any ideas? I can add gettext on the Windows CI machine if you want to try a pull request with some solution (of course, we'd see the test failures for other contributors too in the meantime).

comment:12 by Claude Paroz, 8 years ago

I have no Windows system to test with, and don't like blind debugging. If Windows users care for Django, let them step in.

comment:13 by Tim Graham, 8 years ago

Cc: Ramiro Morales added

Ramiro, maybe you could help with this issue at your convenience (before Django 1.10)?

comment:14 by Tim Graham, 8 years ago

Claude, the new test is also failing on Ubuntu 12.04 and Python 3 (possibly due to msgfmt 0.18.1 there vs 0.18.3 on 14.04?). Also seen this on my Windows setup (again, Python 3 only) which uses msgfmt 0.17.

Traceback (most recent call last):
  File "/mnt/jenkinsdata/workspace/django-master/database/sqlite3/python/python3.4/tests/i18n/test_compilation.py", line 177, in test_msgfmt_error_including_non_ascii
    call_command('compilemessages', locale=['ko'], verbosity=0)
  File "/usr/lib/python3.4/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/mnt/jenkinsdata/workspace/django-master/database/sqlite3/python/python3.4/django/test/testcases.py", line 586, in _assert_raises_message_cm
    yield cm
AssertionError: CommandError not raised

comment:15 by Claude Paroz, 8 years ago

Are you able to install a more recent gettext on Windows to see if it's gettext-related?

comment:16 by Claude Paroz, 8 years ago

I was able to reproduce that on Linux with gettext 0.18.1. I guess that python-brace-format is a "recent" gettext addition. A solution might be to simply skip test_msgfmt_error_including_non_ascii with older gettext versions. Not sure if that fixes all failures.

comment:17 by Tim Graham, 8 years ago

Skipping the test for older versions of gettext seems to work. This doesn't solve the other issues on Windows.

comment:18 by Tim Graham <timograham@…>, 8 years ago

In 93be2f7:

Refs #25677 -- Skipped an i18n test on older gettext versions.

comment:19 by Tim Graham, 8 years ago

Keywords: 1.10 added

comment:20 by Claude Paroz, 8 years ago

Keywords: windows added

comment:21 by Claude Paroz, 8 years ago

Owner: changed from Claude Paroz <claude@…> to Claude Paroz
Status: newassigned

comment:22 by Claude Paroz, 8 years ago

Owner: Claude Paroz removed
Status: assignednew

comment:23 by Ramiro Morales, 8 years ago

Owner: set to Ramiro Morales
Status: newassigned

comment:24 by Ramiro Morales, 8 years ago

Resolution: fixed
Status: assignedclosed

I've opened #26645 to track these comment:10 errors. I suspect they aren't related to the original issue reported here which was fixed by Claude on fa08d27fb714534670b431fde0cd04a17d637585.

This, I'm re-closing this one as fixed.

Version 0, edited 8 years ago by Ramiro Morales (next)
Note: See TracTickets for help on using tickets.
Back to Top