Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#29452 closed Bug (fixed)

makemessages command doesn't set .pot file charset properly

Reported by: Bartosz Grabski Owned by: Bartosz Grabski
Component: Internationalization Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When running python manage.py makemessages, I'm getting the following error:

CommandError: errors happened while running msguniq
C:\dev\xxx\locale\django.pot:2738: C:\dev\xxx\locale\django.pot: input is not valid in "ASCII" encoding

This is because some of my translatable strings contain non-ASCII characters. I've checked the code in makemessages.py and found the culprit:

        for line in pot_lines:
            if not found and not header_read:
                found = True
                line = line.replace('charset=CHARSET', 'charset=UTF-8')
            if not line and not found:
                header_read = True
            lines.append(line)

Since found is set to True on the first iteration, charset is never updated as it's usually on line 17.

Attachments (1)

29452-test.diff (1.4 KB ) - added by Ramiro Morales 6 years ago.
Test case

Download all attachments as: .zip

Change History (13)

comment:1 by Bartosz Grabski, 6 years ago

Owner: changed from nobody to Bartosz Grabski
Status: newassigned

comment:2 by Markus Holtermann, 6 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

comment:4 by Ingo Klöcker, 6 years ago

Patch needs improvement: set

I would add a test for this.

comment:5 by Bartosz Grabski, 6 years ago

Will do.

comment:6 by Ramiro Morales, 6 years ago

I'm the one who introduced this bug in 6ab0d1358fc78077064aab88a4fb0a47ca116391. Mea culpa.

I can contribute a test case (also attached):

  • tests/i18n/commands/templates/test.html

    diff --git a/tests/i18n/commands/templates/test.html b/tests/i18n/commands/templates/test.html
    index cac034e..3868dc1 100644
    a b Plural for a `trans` and `blocktrans` collision case  
    105105{% endblocktrans %}
    106106
    107107{% trans "Non-breaking space :" %}
     108
     109{% trans "Nón-ÁSCÍÏ text" %}
  • tests/i18n/test_extraction.py

    diff --git a/tests/i18n/test_extraction.py b/tests/i18n/test_extraction.py
    index d9ce3b4..e7557fc 100644
    a b class BasicExtractorTests(ExtractorTests):  
    394394            po_contents = fp.read()
    395395            self.assertMsgStr("Größe", po_contents)
    396396
     397    def test_pot_charset_header_is_utf8(self):
     398        self.assertFalse(os.path.exists(self.POT_FILE))
     399        management.call_command('makemessages', locale=[LOCALE], verbosity=0, keep_pot=True)
     400        self.assertTrue(os.path.exists(self.POT_FILE))
     401        with open(self.POT_FILE, 'r', encoding='utf-8') as fp:
     402            contents = fp.read()
     403            self.assertIn(r'; charset=UTF-8\n"', contents)
     404
    397405
    398406class JavascriptExtractorTests(ExtractorTests):
    399407

Problem is I can't reproduce the error condition.

In the added test case:

  • There is a translatable literal with non-ASCII characters (in a template file)
  • The intermediate POT file is created (and preserved for examination)
  • When the POT file is created, the header "Content-Type: text/plain; charset=?????\n" is verified and it already has the UFT-8 value for the charset.

Am I missing something? How it comes the created POT file has a "Content-Type: text/plain; charset=CHARSET\n" header?

  • Is the fact tha OP is running on Windows?
  • Does this happen when extracting literal from .py files? Javascript?
Version 0, edited 6 years ago by Ramiro Morales (next)

by Ramiro Morales, 6 years ago

Attachment: 29452-test.diff added

Test case

comment:7 by Claude Paroz, 6 years ago

I think you could just unit test the write_pot_file method with some content like:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE\'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-06-07 17:21+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: django/contrib/gis/apps.py:8
msgid "GIS"
msgstr ""

comment:8 by Bartosz Grabski, 6 years ago

Thanks Claude, that was actually my idea. Will do.
Ramiro: thanks for the tip.

comment:9 by Bartosz Grabski, 6 years ago

comment:10 by Tim Graham <timograham@…>, 6 years ago

Resolution: fixed
Status: assignedclosed

In 2bc01475:

Fixed #29452 -- Fixed makemessages setting charset of .pot files.

comment:11 by Claude Paroz, 6 years ago

As this was a regression (read comment:6), I'd be willing to backport this to the 2.1 branch. Any opposition?

comment:12 by Tim Graham <timograham@…>, 6 years ago

In c7d59825:

[2.1.x] Fixed #29452 -- Fixed makemessages setting charset of .pot files.

Backport of 2bc014750adb093131f77e4c20bc17ba64b75cac from master

Note: See TracTickets for help on using tickets.
Back to Top