Opened 9 years ago

Closed 8 years ago

#3063 closed defect (fixed)

[patch] i18n broken for msgids with extended characters

Reported by: akaihola Owned by: hugo
Component: Internationalization Version:
Severity: normal Keywords: i18n
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Message ids with non-us-ascii characters in a UTF-8 encoded .po file are not recognized. Django returns the original string even though a translation is provided.

The reason is that Python's GNUTranslations.gettext() expects the message id as a unicode string, but gettext() in trans_real.py calls it with UTF-8 encoded message ids.

Even if this is fixed by decoding the message id before calling t.gettext(), there is another problem with missing messages: Python's GNUTranslations.gettext() returns UTF-8 encoded strings, but when a message is not found, it just returns the original unicode message id, and the template engine chokes when trying to join mixed unicode and normal strings.

The Python gettext documentation recommends to use ugettext() instead of gettext() so everything is handled in unicode.

I know it's exceptional to use non-us-ascii message ids, but my strong opinion is that since we're beginning to live in a unicode world, it shouldn't be a problem anymore. And I'm sure many people have a situation where an existing single-language non-English site is i18n'ed and it's simply most practical just to wrap existing non-English messages in {%trans%} tags and _() calls. A multi-lingual site might even use only languages with non-us-ascii character sets.

I'll post a patch which fixes both problems described above.

Attachments (4)

ugettext.diff (1.7 KB) - added by akaihola 9 years ago.
[patch] Fixes translation of non-us-ascii message ids
tests_i18n_non-us-ascii_msgids.diff (2.1 KB) - added by akaihola 9 years ago.
Additions to the template test suite for non-us-ascii UTF-8 message ids
tests_i18n_non-us-ascii_msgids.2.diff (2.3 KB) - added by akaihola 9 years ago.
Oops, diff root path was wrong. Corrected here.
ugettext.2.diff (1.6 KB) - added by akaihola 9 years ago.
fixed the diff paths

Download all attachments as: .zip

Change History (9)

Changed 9 years ago by akaihola

[patch] Fixes translation of non-us-ascii message ids

Changed 9 years ago by akaihola

Additions to the template test suite for non-us-ascii UTF-8 message ids

Changed 9 years ago by akaihola

Oops, diff root path was wrong. Corrected here.

comment:1 Changed 9 years ago by akaihola

  • Component changed from Internationalization to Translations
  • Summary changed from i18n broken for msgids with extended characters to [patch] i18n broken for msgids with extended characters

comment:2 Changed 9 years ago by akaihola

  • Component changed from Translations to Internationalization

Oops, I suppose the "Translations" component refers to the actual .po files. Changed back to i18n. I hope hugo still spots this.

Changed 9 years ago by akaihola

fixed the diff paths

comment:3 Changed 9 years ago by Simon G. <dev@…>

  • Keywords i18n added
  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 8 years ago by mtredinnick

  • Triage Stage changed from Accepted to Design decision needed

This is a really expensive change, performance-wise: every string now passes through a call to encode() prior to display. Since the alternative is do a more or less once-off translation of the source templates to UTF-8, I'm not convinced the performance hit is justified. I'm normally a big advocate for supporting more than just UTF-8 encodings because of the benefits to East Asian locales, but in this case even I'm not sure. Moving back to "design decision needed" for a bit more thinking.

comment:5 Changed 8 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

All the problems mentioned here have been fixed in the Unicode changes ([5609]) and with the addition of proper support for non-ASCII msgids in [5708].

Note: See TracTickets for help on using tickets.
Back to Top