Context Navigation

← Previous Ticket
Next Ticket →

#3063 closed defect (fixed)

[patch] i18n broken for msgids with extended characters

Reported by:	Antti Kaihola	Owned by:	hugo
Component:	Internationalization	Version:
Severity:	normal	Keywords:	i18n
Cc:		Triage Stage:	Design decision needed
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

Message ids with non-us-ascii characters in a UTF-8 encoded .po file are not recognized. Django returns the original string even though a translation is provided.

The reason is that Python's GNUTranslations.gettext() expects the message id as a unicode string, but gettext() in trans_real.py calls it with UTF-8 encoded message ids.

Even if this is fixed by decoding the message id before calling t.gettext(), there is another problem with missing messages: Python's GNUTranslations.gettext() returns UTF-8 encoded strings, but when a message is not found, it just returns the original unicode message id, and the template engine chokes when trying to join mixed unicode and normal strings.

The Python gettext documentation recommends to use ugettext() instead of gettext() so everything is handled in unicode.

I know it's exceptional to use non-us-ascii message ids, but my strong opinion is that since we're beginning to live in a unicode world, it shouldn't be a problem anymore. And I'm sure many people have a situation where an existing single-language non-English site is i18n'ed and it's simply most practical just to wrap existing non-English messages in {%trans%} tags and _() calls. A multi-lingual site might even use only languages with non-us-ascii character sets.

I'll post a patch which fixes both problems described above.

Attachments (4)

ugettext.diff (1.7 KB ) - added by Antti Kaihola 18 years ago.: [patch] Fixes translation of non-us-ascii message ids
tests_i18n_non-us-ascii_msgids.diff (2.1 KB ) - added by Antti Kaihola 18 years ago.: Additions to the template test suite for non-us-ascii UTF-8 message ids
tests_i18n_non-us-ascii_msgids.2.diff (2.3 KB ) - added by Antti Kaihola 18 years ago.: Oops, diff root path was wrong. Corrected here.
ugettext.2.diff (1.6 KB ) - added by Antti Kaihola 18 years ago.: fixed the diff paths

Download all attachments as: .zip

Change History (9)

by Antti Kaihola, 18 years ago

Attachment:	ugettext.diff added

[patch] Fixes translation of non-us-ascii message ids

by Antti Kaihola, 18 years ago

Attachment:	tests_i18n_non-us-ascii_msgids.diff added

Additions to the template test suite for non-us-ascii UTF-8 message ids

by Antti Kaihola, 18 years ago

Attachment:	tests_i18n_non-us-ascii_msgids.2.diff added

Oops, diff root path was wrong. Corrected here.

comment:1 by Antti Kaihola, 18 years ago

Component:	Internationalization → Translations
Summary:	i18n broken for msgids with extended characters → [patch] i18n broken for msgids with extended characters

comment:2 by Antti Kaihola, 18 years ago

Component:	Translations → Internationalization

Oops, I suppose the "Translations" component refers to the actual .po files. Changed back to i18n. I hope hugo still spots this.

by Antti Kaihola, 18 years ago

Attachment:	ugettext.2.diff added

fixed the diff paths

comment:3 by Simon G. <dev@…>, 18 years ago

Keywords:	i18n added
Triage Stage:	Unreviewed → Accepted

comment:4 by Malcolm Tredinnick, 18 years ago

Triage Stage:	Accepted → Design decision needed

This is a really expensive change, performance-wise: every string now passes through a call to encode() prior to display. Since the alternative is do a more or less once-off translation of the source templates to UTF-8, I'm not convinced the performance hit is justified. I'm normally a big advocate for supporting more than just UTF-8 encodings because of the benefits to East Asian locales, but in this case even I'm not sure. Moving back to "design decision needed" for a bit more thinking.

comment:5 by Malcolm Tredinnick, 18 years ago

Resolution:	→ fixed
Status:	new → closed

All the problems mentioned here have been fixed in the Unicode changes ([5609]) and with the addition of proper support for non-ASCII msgids in [5708].

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#3063 closed defect (fixed)

[patch] i18n broken for msgids with extended characters

Description

Attachments (4)

Change History (9)

by Antti Kaihola, 18 years ago

by Antti Kaihola, 18 years ago

by Antti Kaihola, 18 years ago

comment:1 by Antti Kaihola, 18 years ago

comment:2 by Antti Kaihola, 18 years ago

by Antti Kaihola, 18 years ago

comment:3 by Simon G. <dev@…>, 18 years ago

comment:4 by Malcolm Tredinnick, 18 years ago

comment:5 by Malcolm Tredinnick, 18 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us