#29452 closed Bug (fixed)
makemessages command doesn't set .pot file charset properly
Description ¶
When running python manage.py makemessages
, I'm getting the following error:
CommandError: errors happened while running msguniq C:\dev\xxx\locale\django.pot:2738: C:\dev\xxx\locale\django.pot: input is not valid in "ASCII" encoding
This is because some of my translatable strings contain non-ASCII characters. I've checked the code in makemessages.py
and found the culprit:
for line in pot_lines: if not found and not header_read: found = True line = line.replace('charset=CHARSET', 'charset=UTF-8') if not line and not found: header_read = True lines.append(line)
Since found
is set to True
on the first iteration, charset is never updated as it's usually on line 17.
Change History (13)
comment:1 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 7 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Bug |
comment:3 by , 7 years ago
Has patch: | set |
---|
comment:6 by , 7 years ago
I'm the one who introduced this bug in 6ab0d1358fc78077064aab88a4fb0a47ca116391. Mea culpa.
I can contribute a test case (also attached):
-
TabularUnified tests/i18n/test_extraction.py
diff --git a/tests/i18n/test_extraction.py b/tests/i18n/test_extraction.py index d9ce3b4..a0d16b9 100644
a b class ExtractorTests(POFileAssertionMixin, RunInTmpDirMixin, SimpleTestCase): 128 128 129 129 class BasicExtractorTests(ExtractorTests): 130 130 131 POT_FILE = 'locale/django.pot' 132 131 133 @override_settings(USE_I18N=False) 132 134 def test_use_i18n_false(self): 133 135 """ … … class BasicExtractorTests(ExtractorTests): 394 396 po_contents = fp.read() 395 397 self.assertMsgStr("Größe", po_contents) 396 398 399 def test_pot_charset_header_is_utf8(self): 400 self.assertFalse(os.path.exists(self.POT_FILE)) 401 management.call_command('makemessages', locale=[LOCALE], verbosity=0, keep_pot=True) 402 self.assertTrue(os.path.exists(self.POT_FILE)) 403 with open(self.POT_FILE, 'r', encoding='utf-8') as fp: 404 contents = fp.read() 405 self.assertIn(r'; charset=UTF-8\n"', contents) 406 397 407 398 408 class JavascriptExtractorTests(ExtractorTests): 399 409
Problem is I can't reproduce the error condition.
In the added test case:
- There is a translatable literal with non-ASCII characters (in a template file)
- The intermediate POT file is created (and preserved for examination)
- When the POT file is created, the header
"Content-Type: text/plain; charset=?????\n"
is verified and it already has theUFT-8
value for the charset.
Am I missing something? How it comes the created POT file has a "Content-Type: text/plain; charset=CHARSET\n"
header?
- Is the fact tha OP is running on Windows?
- Does this happen when extracting literal from .py files? Javascript?
comment:7 by , 7 years ago
I think you could just unit test the write_pot_file method with some content like:
# SOME DESCRIPTIVE TITLE. # Copyright (C) YEAR THE PACKAGE\'S COPYRIGHT HOLDER # This file is distributed under the same license as the PACKAGE package. # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2018-06-07 17:21+0200\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "Language: \n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n" #: django/contrib/gis/apps.py:8 msgid "GIS" msgstr ""
comment:8 by , 7 years ago
Thanks Claude, that was actually my idea. Will do.
Ramiro: thanks for the tip.
comment:11 by , 7 years ago
As this was a regression (read comment:6), I'd be willing to backport this to the 2.1 branch. Any opposition?
PR: https://github.com/django/django/pull/9997