Opened 17 years ago

Closed 17 years ago

#3063 closed defect (fixed)

[patch] i18n broken for msgids with extended characters

Reported by: Antti Kaihola Owned by: hugo
Component: Internationalization Version:
Severity: normal Keywords: i18n
Cc: Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Message ids with non-us-ascii characters in a UTF-8 encoded .po file are not recognized. Django returns the original string even though a translation is provided.

The reason is that Python's GNUTranslations.gettext() expects the message id as a unicode string, but gettext() in trans_real.py calls it with UTF-8 encoded message ids.

Even if this is fixed by decoding the message id before calling t.gettext(), there is another problem with missing messages: Python's GNUTranslations.gettext() returns UTF-8 encoded strings, but when a message is not found, it just returns the original unicode message id, and the template engine chokes when trying to join mixed unicode and normal strings.

The Python gettext documentation recommends to use ugettext() instead of gettext() so everything is handled in unicode.

I know it's exceptional to use non-us-ascii message ids, but my strong opinion is that since we're beginning to live in a unicode world, it shouldn't be a problem anymore. And I'm sure many people have a situation where an existing single-language non-English site is i18n'ed and it's simply most practical just to wrap existing non-English messages in {%trans%} tags and _() calls. A multi-lingual site might even use only languages with non-us-ascii character sets.

I'll post a patch which fixes both problems described above.

Attachments (4)

ugettext.diff (1.7 KB ) - added by Antti Kaihola 17 years ago.
[patch] Fixes translation of non-us-ascii message ids
tests_i18n_non-us-ascii_msgids.diff (2.1 KB ) - added by Antti Kaihola 17 years ago.
Additions to the template test suite for non-us-ascii UTF-8 message ids
tests_i18n_non-us-ascii_msgids.2.diff (2.3 KB ) - added by Antti Kaihola 17 years ago.
Oops, diff root path was wrong. Corrected here.
ugettext.2.diff (1.6 KB ) - added by Antti Kaihola 17 years ago.
fixed the diff paths

Download all attachments as: .zip

Change History (9)

by Antti Kaihola, 17 years ago

Attachment: ugettext.diff added

[patch] Fixes translation of non-us-ascii message ids

by Antti Kaihola, 17 years ago

Additions to the template test suite for non-us-ascii UTF-8 message ids

by Antti Kaihola, 17 years ago

Oops, diff root path was wrong. Corrected here.

comment:1 by Antti Kaihola, 17 years ago

Component: InternationalizationTranslations
Summary: i18n broken for msgids with extended characters[patch] i18n broken for msgids with extended characters

comment:2 by Antti Kaihola, 17 years ago

Component: TranslationsInternationalization

Oops, I suppose the "Translations" component refers to the actual .po files. Changed back to i18n. I hope hugo still spots this.

by Antti Kaihola, 17 years ago

Attachment: ugettext.2.diff added

fixed the diff paths

comment:3 by Simon G. <dev@…>, 17 years ago

Keywords: i18n added
Triage Stage: UnreviewedAccepted

comment:4 by Malcolm Tredinnick, 17 years ago

Triage Stage: AcceptedDesign decision needed

This is a really expensive change, performance-wise: every string now passes through a call to encode() prior to display. Since the alternative is do a more or less once-off translation of the source templates to UTF-8, I'm not convinced the performance hit is justified. I'm normally a big advocate for supporting more than just UTF-8 encodings because of the benefits to East Asian locales, but in this case even I'm not sure. Moving back to "design decision needed" for a bit more thinking.

comment:5 by Malcolm Tredinnick, 17 years ago

Resolution: fixed
Status: newclosed

All the problems mentioned here have been fixed in the Unicode changes ([5609]) and with the addition of proper support for non-ASCII msgids in [5708].

Note: See TracTickets for help on using tickets.
Back to Top