Code

Opened 6 years ago

Closed 6 years ago

Last modified 3 years ago

#8626 closed Uncategorized (wontfix)

Translations from "en_US" locale being used even though request.LANGUAGE_CODE is "en"

Reported by: francisoreilly Owned by: nobody
Component: Internationalization Version: master
Severity: Normal Keywords: locale language en-us en-US
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I've got a situation where even though the template has request.LANGUAGE_CODE=="en", the "en_US" translations are being rendered instead of the "en" translations. Furthermore, if request.LANGUAGE_CODE=="en-gb", the "en_GB" translation is being pulled back, correctly. In summary:

  • if LANGUAGE_CODE=="en" -> pulls back "en_US" translations (incorrect)
  • if LANGUAGE_CODE=="en-gb" -> pulls back "en_GB" translations (expected result)
  • if LANGUAGE_CODE=="en-us" -> pulls back "en_US" translations (expected result)

To demonstrate the problem I put together and attached a tar.gz of a simple project directory:

  1. A homepage that has a dropdown control for selecting/setting the user's chosen language, choices are "en", "en-gb" and "en-us". The form sets the request.LANGUAGE_CODE via the set-language view (django.conf.urls.i18n). urls.py is setup to activate the set-language view when user clicks Submit. The homepage itself lives at /index.html/
  1. Three locale translations corresponding, i.e. "en", "en_GB" and "en_US" in the locale subdir. I've localized the text "Homepage" with different text strings for each of the three locales. django.po
  1. A settings.py which specifies the three LANGUAGES, a LANGUAGE_CODE of "en". It also pulls in the LocaleMiddleware as is necessary for locale translations.

I think the other settings/files included are not relevant to the problem (e.g. the sqlite_db database file, etc), they're only included to form a runnable project.

I've been able to show this behaviour in 1.0-beta_2-SVN-8643 - simply go to the homepage at /index.html/ and choose the different language values and submit. The page refreshes to show the current value of request.LANGUAGE_CODE and also which translation has been pulled back.

Attachments (4)

locale_test.tar.gz (6.3 KB) - added by francisoreilly 6 years ago.
tar.gz of the project demonstrating the problem
locale_test.tar.2.gz (5.8 KB) - added by oggy 6 years ago.
trans_real_r8739.diff (560 bytes) - added by oggy 6 years ago.
8626.diff (802 bytes) - added by arien 6 years ago.
Patch the locale.locale_alias dict for 'en'.

Download all attachments as: .zip

Change History (16)

Changed 6 years ago by francisoreilly

tar.gz of the project demonstrating the problem

comment:1 Changed 6 years ago by francisoreilly

  • milestone set to 1.0
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 6 years ago by francisoreilly

  • Keywords locale language en-us en-US added

comment:3 Changed 6 years ago by mtredinnick

I can confirm this is happening (using the project provided). Poking around a lot, the problem seems to be happening as result of something inside django.utils.translation.trans_real.translation(). The Django development server makes a call to activate("en_US") as part of setting up itself initially and then the correct locale is set for each request. Somehow, as part of that initial setup, the en_US translation is being cached for en.

I'll pick this up again in the morning if nobody solves it in the interim.

comment:4 Changed 6 years ago by jacob

  • Triage Stage changed from Unreviewed to Accepted

comment:5 Changed 6 years ago by oggy

This is very weird. From what I can see, the fault isn't on Django's part. I have the exact same setup working perfectly (after fixing #7163) for providing Serbian language in two scripts (sr and sr_LATN).

Unless I'm missing something, Python gettext (or gettext in general, don't know anything about the implementation) will choose the en_US translation over the en translation even if they both exist. But it only happens for the en locale. I'm attaching a "diagnostic" diff for trans_real and a tgz of your app with two Serbian locales added so you can see for yourself.

Phew, for once not having English as my primary language seems to make things easier for me ;)

Changed 6 years ago by oggy

Changed 6 years ago by oggy

comment:6 Changed 6 years ago by arien

Okay, that was fun... This is a problem in Python's locale module, where (basically) en is mapped to en_US in the locale.locale_alias dict. This in turn causes the wrong .mo file to be loaded.

In detail: gettext.translation (which is used in django.utils.translation.trans_real.translation) calls gettext.find to locate the correct .mo file to load, which calls (the misnamed) gettext._expand_lang (gettext._expand_lang returns a list of possible locale names for the given locale). Finally, gettext._expand_lang calls locale.normalize for the normalized name of a locale, and locale.normalize uses the locale.locale_alias dict. And it just so happens that en is mapped to en_US.ISO8859-1...

>>> import gettext
>>> gettext.find('django', 'locale', ['en'], all=1)
['locale/en_US/LC_MESSAGES/django.mo', 'locale/en/LC_MESSAGES/django.mo']
>>> gettext._expand_lang('en')
['en_US.ISO8859-1', 'en_US', 'en.ISO8859-1', 'en']
>>>
>>> # Monkey patch our way out of trouble.
>>> import locale
>>> locale.locale_alias['en'] = 'en.ISO8859-1'
>>> gettext.find('django', 'locale', ['en'], all=1)
['locale/en/LC_MESSAGES/django.mo']
>>> gettext._expand_lang('en')
['en.ISO8859-1', 'en']

(The output is from a session in the project directory.)

I'll attach a patch.

Changed 6 years ago by arien

Patch the locale.locale_alias dict for 'en'.

comment:7 Changed 6 years ago by arien

  • Has patch set
  • Patch needs improvement set

I don't see a nicer way to fix this, but if anybody has a bright idea...

comment:8 Changed 6 years ago by ramiro

Wow guys, that was smart and digging at very low level.

Following a suggestion from Malcolm I had been trying to even unconditionally (by not checking if the new one shared a lang spec prefix with a previously cached one) deepcopying the Python translation objects in translation._fetch and littering the trans_real.py file with debugging statements, all without success and without a clue of what could be the real cause of the problem.

comment:9 Changed 6 years ago by mtredinnick

  • Resolution set to wontfix
  • Status changed from new to closed

This is excellent work and it's nice to understand what's going on. However, at the end of the day, I suspect this is probably not a bug. Firstly, the locale.locale_alias dictionary maps every single "language only" locale specifier to a languagee + country version. It's unfortunate that English as spoken by the English isn't the canonical mapping and the US version is used instead, but a choice had to be made. The fact is that the designator "en" is ambiguous. It needs to be mapped to "English as spoken in XYZ" for some value of XYZ.

With, say, Norwegian, the intuitively right thing is what actually happens because Norwegian as spoken by the people of Norway is relatively obvious (the English case was obvious, too, dagnabbit! But they chose poorly! Yes, I know there are historical reasons why this is done in POSIX systems; doesn't make it right :-( )

The same problem as noted here will occur if somebody creates differing "es" and "es_ES" translations. They will always get back the "es_ES" version. Again, because the initial designator has to have the ambiguity resolved somehow.

Conclusion: this is a wontfix situation, because it's arguably not a bug and the behaviour is consistent. If we patch English, why not start patching all the other locales (en_uk -> en_GB, but english_uk -> en_EN ... huh?!)? A documentation patch (in a separate ticket so that it can be handled in isolation) might be appropriate to explain this. I realise the whole i18n situation can get pretty convoluted when you're trying to do the right thing.

comment:10 Changed 6 years ago by arien

To make this work, you can use en-en as the LANGUAGE_CODE instead of en.

comment:11 Changed 3 years ago by d_leblond@…

  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset

I ran into this problem today.

with the following imported:
from django.utils.translation import activate

you can do:

activate(request.LANGUAGE_CODE)

this will properly set the language when request.LANGUAGE_CODE is 'en' (instead of the improper en-us default)

comment:12 Changed 3 years ago by jacob

  • milestone 1.0 deleted

Milestone 1.0 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.