﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
23271	Makemessages can corrupt existing .po files on Windows	Daniel Menzel	nobody	"== Description ==

Seen on German Windows7 SP1 with 64bit Python 3.4.1 and gettext 0.18.1.

When you have '''an existing .po file''' with translations, e.g.
{{{
msgid """"
msgstr """"
""Project-Id-Version: \n""
""Report-Msgid-Bugs-To: \n""
""POT-Creation-Date: 2014-03-03 10:44+0100\n""
""PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n""
""Last-Translator: \n""
""Language-Team: \n""
""Language: de\n""
""MIME-Version: 1.0\n""
""Content-Type: text/plain; charset=UTF-8\n""
""Content-Transfer-Encoding: 8bit\n""
""Plural-Forms: nplurals=2; plural=(n != 1);\n""

msgid ""Size""
msgstr ""Größe""
}}}
and then run 
{{{
manage.py makemessages --no_location --no_wrap -l de
}}}
to update the .po file, you get '''a corrupted .po file''':
{{{
msgid """"
msgstr """"
""Project-Id-Version: \n""
""Report-Msgid-Bugs-To: \n""
""POT-Creation-Date: 2014-03-03 10:44+0100\n""
""PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n""
""Last-Translator: \n""
""Language-Team: \n""
""Language: de\n""
""MIME-Version: 1.0\n""
""Content-Type: text/plain; charset=UTF-8\n""
""Content-Transfer-Encoding: 8bit\n""
""Plural-Forms: nplurals=2; plural=(n != 1);\n""

msgid ""Size""
msgstr ""GrÃƒÂ¶ÃƒÅ¸e""
}}}

== Investigation ==

Setting environment variables like LANG, LANGUAGE, LC_ALL, LC_MESSAGES, etc. has no effect on the outcome. Also calling chcp 65001 on the Windows console does not fix the problem.

However, it seems like the described behavior was introduced as a side-effect with this commit [https://github.com/django/django/commit/dbb48d2bb99a5f660cf2d85f137b8d87fc12d99f]:
All file accesses in ''makemessages.py'' were changed to explicitly use utf-8, but the stdout of the gettext binaries (like ''msgmerge'' etc.) are still interpreted with the Windows encoding cp1252, as ''popen_wrapper'' sets ''universal_newline=True'' which in turn uses the encoding returned by ''locale.getpreferredencoding()''.

As a test I patched ''popen_wrapper'' to interpret output of external processes with utf-8 encoding instead of ''locale.getpreferredencoding()'':
{{{
def popen_wrapper_utf8(args, os_err_exc_type=django.core.management.base.CommandError):
    """"""Monkey-patch for django.core.management.utils.popen_wrapper""""""
    try:
        p = Popen(args, shell=False, stdout=PIPE, stderr=PIPE,
                  close_fds=os.name != 'nt', universal_newlines=True)
    except OSError as e:
        six.reraise(os_err_exc_type, os_err_exc_type('Error executing %s: %s' %
                                                     (args[0], e.strerror)), sys.exc_info()[2])
    output, errors = p.communicate()

    # Additional utf-8 decoding
    output = output.encode(locale.getpreferredencoding(False)).decode('utf-8')
    #

    return (
        output,
        force_text(errors, DEFAULT_LOCALE_ENCODING, strings_only=True),
        p.returncode
    )
}}}
This has the desired effect and prevents the corruption of the .po files.

I also tried changing the value returned by ''locale.getpreferredencoding()'' to ""utf-8"", but that seems impossible on Windows, as Python uses the win32 API ''GetACP()'', which according to MSDN [http://msdn.microsoft.com/en-us/library/windows/desktop/dd318070(v=vs.85).aspx] only returns ANSI codepages and thus will never return ""utf-8"".
"	Bug	closed	Internationalization	dev	Release blocker	fixed	makemessages utf8 unicode		Accepted	0	0	0	0	0	0
