Opened 3 months ago

Closed 2 months ago

Last modified 2 months ago

#29452 closed Bug (fixed)

makemessages command doesn't set .pot file charset properly

Reported by: Bartosz Grabski Owned by: Bartosz Grabski
Component: Internationalization Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When running python manage.py makemessages, I'm getting the following error:

CommandError: errors happened while running msguniq
C:\dev\xxx\locale\django.pot:2738: C:\dev\xxx\locale\django.pot: input is not valid in "ASCII" encoding

This is because some of my translatable strings contain non-ASCII characters. I've checked the code in makemessages.py and found the culprit:

        for line in pot_lines:
            if not found and not header_read:
                found = True
                line = line.replace('charset=CHARSET', 'charset=UTF-8')
            if not line and not found:
                header_read = True
            lines.append(line)

Since found is set to True on the first iteration, charset is never updated as it's usually on line 17.

Attachments (1)

29452-test.diff (1.4 KB) - added by Ramiro Morales 2 months ago.
Test case

Download all attachments as: .zip

Change History (13)

comment:1 Changed 3 months ago by Bartosz Grabski

Owner: changed from nobody to Bartosz Grabski
Status: newassigned

comment:2 Changed 3 months ago by Markus Holtermann

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

comment:3 Changed 3 months ago by Bartosz Grabski

Has patch: set

comment:4 Changed 3 months ago by Ingo Klöcker

Patch needs improvement: set

I would add a test for this.

comment:5 Changed 3 months ago by Bartosz Grabski

Will do.

comment:6 Changed 2 months ago by Ramiro Morales

I'm the one who introduced this bug in 6ab0d1358fc78077064aab88a4fb0a47ca116391. Mea culpa.

I can contribute a test case (also attached):

  • tests/i18n/test_extraction.py

    diff --git a/tests/i18n/test_extraction.py b/tests/i18n/test_extraction.py
    index d9ce3b4..a0d16b9 100644
    a b class ExtractorTests(POFileAssertionMixin, RunInTmpDirMixin, SimpleTestCase): 
    128128
    129129class BasicExtractorTests(ExtractorTests):
    130130
     131    POT_FILE = 'locale/django.pot'
     132
    131133    @override_settings(USE_I18N=False)
    132134    def test_use_i18n_false(self):
    133135        """
    class BasicExtractorTests(ExtractorTests): 
    394396            po_contents = fp.read()
    395397            self.assertMsgStr("Größe", po_contents)
    396398
     399    def test_pot_charset_header_is_utf8(self):
     400        self.assertFalse(os.path.exists(self.POT_FILE))
     401        management.call_command('makemessages', locale=[LOCALE], verbosity=0, keep_pot=True)
     402        self.assertTrue(os.path.exists(self.POT_FILE))
     403        with open(self.POT_FILE, 'r', encoding='utf-8') as fp:
     404            contents = fp.read()
     405            self.assertIn(r'; charset=UTF-8\n"', contents)
     406
    397407
    398408class JavascriptExtractorTests(ExtractorTests):
    399409

Problem is I can't reproduce the error condition.

In the added test case:

  • There is a translatable literal with non-ASCII characters (in a template file)
  • The intermediate POT file is created (and preserved for examination)
  • When the POT file is created, the header "Content-Type: text/plain; charset=?????\n" is verified and it already has the UFT-8 value for the charset.

Am I missing something? How it comes the created POT file has a "Content-Type: text/plain; charset=CHARSET\n" header?

  • Is the fact tha OP is running on Windows?
  • Does this happen when extracting literal from .py files? Javascript?
Last edited 2 months ago by Ramiro Morales (previous) (diff)

Changed 2 months ago by Ramiro Morales

Attachment: 29452-test.diff added

Test case

comment:7 Changed 2 months ago by Claude Paroz

I think you could just unit test the write_pot_file method with some content like:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE\'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-06-07 17:21+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: django/contrib/gis/apps.py:8
msgid "GIS"
msgstr ""

comment:8 Changed 2 months ago by Bartosz Grabski

Thanks Claude, that was actually my idea. Will do.
Ramiro: thanks for the tip.

comment:9 Changed 2 months ago by Bartosz Grabski

comment:10 Changed 2 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 2bc01475:

Fixed #29452 -- Fixed makemessages setting charset of .pot files.

comment:11 Changed 2 months ago by Claude Paroz

As this was a regression (read comment:6), I'd be willing to backport this to the 2.1 branch. Any opposition?

comment:12 Changed 2 months ago by Tim Graham <timograham@…>

In c7d59825:

[2.1.x] Fixed #29452 -- Fixed makemessages setting charset of .pot files.

Backport of 2bc014750adb093131f77e4c20bc17ba64b75cac from master

Note: See TracTickets for help on using tickets.
Back to Top