Opened 16 years ago

Closed 16 years ago

Last modified 13 years ago

#8626 closed Uncategorized (wontfix)

Translations from "en_US" locale being used even though request.LANGUAGE_CODE is "en"

Reported by: francisoreilly Owned by: nobody
Component: Internationalization Version: dev
Severity: Normal Keywords: locale language en-us en-US
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I've got a situation where even though the template has request.LANGUAGE_CODE=="en", the "en_US" translations are being rendered instead of the "en" translations. Furthermore, if request.LANGUAGE_CODE=="en-gb", the "en_GB" translation is being pulled back, correctly. In summary:

  • if LANGUAGE_CODE=="en" -> pulls back "en_US" translations (incorrect)
  • if LANGUAGE_CODE=="en-gb" -> pulls back "en_GB" translations (expected result)
  • if LANGUAGE_CODE=="en-us" -> pulls back "en_US" translations (expected result)

To demonstrate the problem I put together and attached a tar.gz of a simple project directory:

  1. A homepage that has a dropdown control for selecting/setting the user's chosen language, choices are "en", "en-gb" and "en-us". The form sets the request.LANGUAGE_CODE via the set-language view (django.conf.urls.i18n). urls.py is setup to activate the set-language view when user clicks Submit. The homepage itself lives at /index.html/
  1. Three locale translations corresponding, i.e. "en", "en_GB" and "en_US" in the locale subdir. I've localized the text "Homepage" with different text strings for each of the three locales. django.po
  1. A settings.py which specifies the three LANGUAGES, a LANGUAGE_CODE of "en". It also pulls in the LocaleMiddleware as is necessary for locale translations.

I think the other settings/files included are not relevant to the problem (e.g. the sqlite_db database file, etc), they're only included to form a runnable project.

I've been able to show this behaviour in 1.0-beta_2-SVN-8643 - simply go to the homepage at /index.html/ and choose the different language values and submit. The page refreshes to show the current value of request.LANGUAGE_CODE and also which translation has been pulled back.

Attachments (4)

locale_test.tar.gz (6.3 KB ) - added by francisoreilly 16 years ago.
tar.gz of the project demonstrating the problem
locale_test.tar.2.gz (5.8 KB ) - added by oggy 16 years ago.
trans_real_r8739.diff (560 bytes ) - added by oggy 16 years ago.
8626.diff (802 bytes ) - added by arien 16 years ago.
Patch the locale.locale_alias dict for 'en'.

Download all attachments as: .zip

Change History (16)

by francisoreilly, 16 years ago

Attachment: locale_test.tar.gz added

tar.gz of the project demonstrating the problem

comment:1 by francisoreilly, 16 years ago

milestone: 1.0

comment:2 by francisoreilly, 16 years ago

Keywords: locale language en-us en-US added

comment:3 by Malcolm Tredinnick, 16 years ago

I can confirm this is happening (using the project provided). Poking around a lot, the problem seems to be happening as result of something inside django.utils.translation.trans_real.translation(). The Django development server makes a call to activate("en_US") as part of setting up itself initially and then the correct locale is set for each request. Somehow, as part of that initial setup, the en_US translation is being cached for en.

I'll pick this up again in the morning if nobody solves it in the interim.

comment:4 by Jacob, 16 years ago

Triage Stage: UnreviewedAccepted

comment:5 by oggy, 16 years ago

This is very weird. From what I can see, the fault isn't on Django's part. I have the exact same setup working perfectly (after fixing #7163) for providing Serbian language in two scripts (sr and sr_LATN).

Unless I'm missing something, Python gettext (or gettext in general, don't know anything about the implementation) will choose the en_US translation over the en translation even if they both exist. But it only happens for the en locale. I'm attaching a "diagnostic" diff for trans_real and a tgz of your app with two Serbian locales added so you can see for yourself.

Phew, for once not having English as my primary language seems to make things easier for me ;)

by oggy, 16 years ago

Attachment: locale_test.tar.2.gz added

by oggy, 16 years ago

Attachment: trans_real_r8739.diff added

comment:6 by arien, 16 years ago

Okay, that was fun... This is a problem in Python's locale module, where (basically) en is mapped to en_US in the locale.locale_alias dict. This in turn causes the wrong .mo file to be loaded.

In detail: gettext.translation (which is used in django.utils.translation.trans_real.translation) calls gettext.find to locate the correct .mo file to load, which calls (the misnamed) gettext._expand_lang (gettext._expand_lang returns a list of possible locale names for the given locale). Finally, gettext._expand_lang calls locale.normalize for the normalized name of a locale, and locale.normalize uses the locale.locale_alias dict. And it just so happens that en is mapped to en_US.ISO8859-1...

>>> import gettext
>>> gettext.find('django', 'locale', ['en'], all=1)
['locale/en_US/LC_MESSAGES/django.mo', 'locale/en/LC_MESSAGES/django.mo']
>>> gettext._expand_lang('en')
['en_US.ISO8859-1', 'en_US', 'en.ISO8859-1', 'en']
>>>
>>> # Monkey patch our way out of trouble.
>>> import locale
>>> locale.locale_alias['en'] = 'en.ISO8859-1'
>>> gettext.find('django', 'locale', ['en'], all=1)
['locale/en/LC_MESSAGES/django.mo']
>>> gettext._expand_lang('en')
['en.ISO8859-1', 'en']

(The output is from a session in the project directory.)

I'll attach a patch.

by arien, 16 years ago

Attachment: 8626.diff added

Patch the locale.locale_alias dict for 'en'.

comment:7 by arien, 16 years ago

Has patch: set
Patch needs improvement: set

I don't see a nicer way to fix this, but if anybody has a bright idea...

comment:8 by Ramiro Morales, 16 years ago

Wow guys, that was smart and digging at very low level.

Following a suggestion from Malcolm I had been trying to even unconditionally (by not checking if the new one shared a lang spec prefix with a previously cached one) deepcopying the Python translation objects in translation._fetch and littering the trans_real.py file with debugging statements, all without success and without a clue of what could be the real cause of the problem.

comment:9 by Malcolm Tredinnick, 16 years ago

Resolution: wontfix
Status: newclosed

This is excellent work and it's nice to understand what's going on. However, at the end of the day, I suspect this is probably not a bug. Firstly, the locale.locale_alias dictionary maps every single "language only" locale specifier to a languagee + country version. It's unfortunate that English as spoken by the English isn't the canonical mapping and the US version is used instead, but a choice had to be made. The fact is that the designator "en" is ambiguous. It needs to be mapped to "English as spoken in XYZ" for some value of XYZ.

With, say, Norwegian, the intuitively right thing is what actually happens because Norwegian as spoken by the people of Norway is relatively obvious (the English case was obvious, too, dagnabbit! But they chose poorly! Yes, I know there are historical reasons why this is done in POSIX systems; doesn't make it right :-( )

The same problem as noted here will occur if somebody creates differing "es" and "es_ES" translations. They will always get back the "es_ES" version. Again, because the initial designator has to have the ambiguity resolved somehow.

Conclusion: this is a wontfix situation, because it's arguably not a bug and the behaviour is consistent. If we patch English, why not start patching all the other locales (en_uk -> en_GB, but english_uk -> en_EN ... huh?!)? A documentation patch (in a separate ticket so that it can be handled in isolation) might be appropriate to explain this. I realise the whole i18n situation can get pretty convoluted when you're trying to do the right thing.

comment:10 by arien, 16 years ago

To make this work, you can use en-en as the LANGUAGE_CODE instead of en.

comment:11 by d_leblond@…, 13 years ago

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

I ran into this problem today.

with the following imported:
from django.utils.translation import activate

you can do:

activate(request.LANGUAGE_CODE)

this will properly set the language when request.LANGUAGE_CODE is 'en' (instead of the improper en-us default)

comment:12 by Jacob, 13 years ago

milestone: 1.0

Milestone 1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top