Opened 12 years ago
Closed 11 years ago
#18419 closed Bug (fixed)
Language code is not correct for Chinese
Reported by: | Owned by: | Bouke Haarsma | |
---|---|---|---|
Component: | Internationalization | Version: | dev |
Severity: | Normal | Keywords: | i18n, chinese, zh |
Cc: | kitsunde@…, Bouke Haarsma, Baptiste Mispelon | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Currently Django uses zh_TW for Traditional Chinese and zh_CN for Simplified Chinese. This should work fine, but may not in the correct way.
This is because Traditional Chinese is not used only by Taiwan (tw) but also Honk Kong (hk), and Simplified Chinese is not only used in China (cn) but also Singapore (sg) and Malaysia. A new standard is to use "zh_Hant" for Traditional Chinese and "zh_Hans" for Simplified Chinese to eliminate the confusion of the content is not just for people lived in China or Taiwan.
Change History (18)
comment:1 by , 12 years ago
Easy pickings: | set |
---|
follow-up: 3 comment:2 by , 12 years ago
comment:3 by , 12 years ago
Replying to ramiro:
Can you point us to an URL to [a document describing] such new standard?
Apple has updated their developer guide to reflect the change. You may want to visit the guide at https://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPInternational/Articles/LanguageDesignations.html and search "hant" or "hans".
comment:4 by , 12 years ago
After I dig into the source code of Django. I found it may have some issues to adapt the new standard. The first is the conversion between local and language code. Locale is in the format of ll_CC, so for example, the "zh_TW" locale can be converted to the "zh-tw" language code. But as the new standard, language code for Traditional Chinese is "zh-Hant", which is not possible to converted to the locale in ll_CC format because "Hant" is not a country.
The second issue is country-specific codes should fallback to new standard if not available. For example, currently most browsers only use "zh-tw", "zh-cn", "zh-sg", "zh-hk" in the "HTTP_ACCEPT_LANGUAGE" field. In such situation, Django should convert "zh-tw" and "zh-hk" to "zh-Hant" and convert "zh-cn" and "zh-sg" to "zh-Hans". This should be done only if the request one ("zh-tw", "zh-hk", "zh-cn", "zh-sg") is not set in the django.conf.LANGUAGES setting.
The last part is Django should merge these translations. For example, if a browser request a "zh-tw" language. Django should merge both "zh-tw" and "zh-Hant" translation files.
Chinese is my mother language. Please free feel to ask me questions if you are still confusing.
comment:5 by , 12 years ago
There is another good reason why Django should adopt "zh-Hant" for Traditional Chinese and "zh-Hans" for Simplified Chinese. Imaging there is a site visitor come from Hong Kong. The "HTTP_ACCEPT_LANGUAGE" field sent by his browser may probably only include zh-hk but no zh-tw. In such situation, even though both zh-hk and zh-tw return Traditional Chinese, the Hong Kong user won't see the Traditional Chinese content because Django see zh-hk doesn't match to zh-tw, and fallback to the default language.
Of course some site administrator would still want to target Hong Kong or Taiwan specific users. In such situation, he can still provide both zh-tw and zh-hk translations. But most of the time, a Chinese site admin don't care where the user come from at all. Just like en-us and en-uk, the site admin may only want to provide one English version. So if Django support zh-hant, no matter user come from Hong Kong or Taiwan, they can see Traditional Chinese as wish. The same applies to "zh-hans", no matter the user come from China or Singapore, they both see Simplified Chinese as wish.
comment:6 by , 12 years ago
zh-Hant and zh-Hans are registered with IANA: http://www.iana.org/assignments/lang-tags/zh-Hant and http://www.iana.org/assignments/lang-tags/zh-Hans.
comment:7 by , 12 years ago
Easy pickings: | unset |
---|---|
Triage Stage: | Unreviewed → Accepted |
I'm going to accept this ticket, but we need a good plan to address backwards compatibility before this can move forward.
You could start a discussion on the django-i18n mailing list to get more input.
comment:8 by , 12 years ago
Cc: | added |
---|
comment:9 by , 12 years ago
We can do it the same way we did with the two Norwegian translations, have both translations (zh-Hant/-Hans and zh-TW/-CN) in core and deprecate the latter after two releases.
comment:10 by , 11 years ago
Cc: | added |
---|---|
Has patch: | set |
Owner: | changed from | to
Status: | new → assigned |
I've created a pull request for this ticket: https://github.com/django/django/pull/1868, based on the strategy proposed by jezdez. So for Django 1.7 and 1.8 there are duplicate chinese translations present; the old ones being deprecated. However when researching I've also found that some browsers send the old (deprecated) language codes, including the latest Firefox. So I've included a check such that visitors with those browsers also get the correct display language, despite incorrect Accept-Language
headers.
comment:11 by , 11 years ago
The pull request looks good, but although you sent it a few hours ago it doesn't apply properly. Did you work from an up-to-date copy of master?
If you bring it up to date, you can mark the ticket as RFC.
comment:12 by , 11 years ago
Fixed in c0a2388a1c4ead1afaec98e4ebc953a772ca3849. (Commit referenced #18149 by mistake.)
comment:14 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:15 by , 11 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
After discussing on IRC we found that (1) some unit tests failed after the patch and (2) the upgrade path to be somewhat bumpy. The problem occurs when LANGUAGES
isn't overridden and would contain both old and new language codes. Now if a browser comes along and requests the old language code (which might include the most recent versions of IE and Firefox), Django will upgrade this old language code to the new language code. Now, if there is some customization done for the old language codes, Django would thus use the new language code and ignored the translations. To mitigate get_supported_language_variant
also needs to check if the old language code is still in the list of supported languages.
comment:16 by , 11 years ago
Cc: | added |
---|---|
Needs tests: | set |
Version: | 1.4 → master |
The PR does fix the broken tests.
I think we should also add a test for the situation that broke the two tests.
From what I understand, that means when zh-cn
is in settings.LANGUAGES
.
comment:18 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Can you point us to an URL to [a document describing] such new standard?