Makemessages can corrupt existing .po files on Windows
|Reported by:||danielmenzel||Owned by:||nobody|
|Severity:||Release blocker||Keywords:||makemessages utf8 unicode|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Seen on German Windows7 SP1 with 64bit Python 3.4.1 and gettext 0.18.1.
When you have an existing .po file with translations, e.g.
msgid "" msgstr "" "Project-Id-Version: \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2014-03-03 10:44+0100\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: \n" "Language-Team: \n" "Language: de\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" msgid "Size" msgstr "Größe"
and then run
manage.py makemessages --no_location --no_wrap -l de
to update the .po file, you get a corrupted .po file:
msgid "" msgstr "" "Project-Id-Version: \n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2014-03-03 10:44+0100\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: \n" "Language-Team: \n" "Language: de\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=UTF-8\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=2; plural=(n != 1);\n" msgid "Size" msgstr "GrÃƒÂ¶ÃƒÅ¸e"
Setting environment variables like LANG, LANGUAGE, LC_ALL, LC_MESSAGES, etc. has no effect on the outcome. Also calling chcp 65001 on the Windows console does not fix the problem.
However, it seems like the described behavior was introduced as a side-effect with this commit https://github.com/django/django/commit/dbb48d2bb99a5f660cf2d85f137b8d87fc12d99f:
All file accesses in makemessages.py were changed to explicitly use utf-8, but the stdout of the gettext binaries (like msgmerge etc.) are still interpreted with the Windows encoding cp1252, as popen_wrapper sets universal_newline=True which in turn uses the encoding returned by locale.getpreferredencoding().
As a test I patched popen_wrapper to interpret output of external processes with utf-8 encoding instead of locale.getpreferredencoding():
def popen_wrapper_utf8(args, os_err_exc_type=django.core.management.base.CommandError): """Monkey-patch for django.core.management.utils.popen_wrapper""" try: p = Popen(args, shell=False, stdout=PIPE, stderr=PIPE, close_fds=os.name != 'nt', universal_newlines=True) except OSError as e: six.reraise(os_err_exc_type, os_err_exc_type('Error executing %s: %s' % (args, e.strerror)), sys.exc_info()) output, errors = p.communicate() # Additional utf-8 decoding output = output.encode(locale.getpreferredencoding(False)).decode('utf-8') # return ( output, force_text(errors, DEFAULT_LOCALE_ENCODING, strings_only=True), p.returncode )
This has the desired effect and prevents the corruption of the .po files.
I also tried changing the value returned by locale.getpreferredencoding() to "utf-8", but that seems impossible on Windows, as Python uses the win32 API GetACP(), which according to MSDN http://msdn.microsoft.com/en-us/library/windows/desktop/dd318070(v=vs.85).aspx only returns ANSI codepages and thus will never return "utf-8".
Change History (9)
comment:1 Changed 7 months ago by claudep
- Needs documentation unset
- Needs tests unset
- Patch needs improvement unset
- Severity changed from Normal to Release blocker
- Triage Stage changed from Unreviewed to Accepted
comment:2 Changed 7 months ago by andrewgodwin
comment:5 Changed 2 months ago by timgraham
- Has patch set
- Triage Stage changed from Accepted to Ready for checkin
comment:6 Changed 2 months ago by Ramiro Morales <ramiro@…>
- Resolution set to fixed
- Status changed from new to closed
comment:7 Changed 2 months ago by timgraham
- Has patch unset
- Resolution fixed deleted
- Status changed from closed to new
- Triage Stage changed from Ready for checkin to Accepted