Opened 12 years ago

Closed 10 years ago

#18419 closed Bug (fixed)

Language code is not correct for Chinese

Reported by: Olli Wang <olliwang@…> Owned by: Bouke Haarsma
Component: Internationalization Version: dev
Severity: Normal Keywords: i18n, chinese, zh
Cc: kitsunde@…, Bouke Haarsma, Baptiste Mispelon Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently Django uses zh_TW for Traditional Chinese and zh_CN for Simplified Chinese. This should work fine, but may not in the correct way.

This is because Traditional Chinese is not used only by Taiwan (tw) but also Honk Kong (hk), and Simplified Chinese is not only used in China (cn) but also Singapore (sg) and Malaysia. A new standard is to use "zh_Hant" for Traditional Chinese and "zh_Hans" for Simplified Chinese to eliminate the confusion of the content is not just for people lived in China or Taiwan.

Change History (18)

comment:1 by Olli Wang <olliwang@…>, 12 years ago

Easy pickings: set

comment:2 by Ramiro Morales, 12 years ago

Can you point us to an URL to [a document describing] such new standard?

in reply to:  2 comment:3 by Olli Wang <olliwang@…>, 12 years ago

Replying to ramiro:

Can you point us to an URL to [a document describing] such new standard?

Apple has updated their developer guide to reflect the change. You may want to visit the guide at https://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPInternational/Articles/LanguageDesignations.html and search "hant" or "hans".

comment:4 by Olli Wang <olliwang@…>, 12 years ago

After I dig into the source code of Django. I found it may have some issues to adapt the new standard. The first is the conversion between local and language code. Locale is in the format of ll_CC, so for example, the "zh_TW" locale can be converted to the "zh-tw" language code. But as the new standard, language code for Traditional Chinese is "zh-Hant", which is not possible to converted to the locale in ll_CC format because "Hant" is not a country.

The second issue is country-specific codes should fallback to new standard if not available. For example, currently most browsers only use "zh-tw", "zh-cn", "zh-sg", "zh-hk" in the "HTTP_ACCEPT_LANGUAGE" field. In such situation, Django should convert "zh-tw" and "zh-hk" to "zh-Hant" and convert "zh-cn" and "zh-sg" to "zh-Hans". This should be done only if the request one ("zh-tw", "zh-hk", "zh-cn", "zh-sg") is not set in the django.conf.LANGUAGES setting.

The last part is Django should merge these translations. For example, if a browser request a "zh-tw" language. Django should merge both "zh-tw" and "zh-Hant" translation files.

Chinese is my mother language. Please free feel to ask me questions if you are still confusing.

comment:5 by Olli Wang <olliwang@…>, 12 years ago

There is another good reason why Django should adopt "zh-Hant" for Traditional Chinese and "zh-Hans" for Simplified Chinese. Imaging there is a site visitor come from Hong Kong. The "HTTP_ACCEPT_LANGUAGE" field sent by his browser may probably only include zh-hk but no zh-tw. In such situation, even though both zh-hk and zh-tw return Traditional Chinese, the Hong Kong user won't see the Traditional Chinese content because Django see zh-hk doesn't match to zh-tw, and fallback to the default language.

Of course some site administrator would still want to target Hong Kong or Taiwan specific users. In such situation, he can still provide both zh-tw and zh-hk translations. But most of the time, a Chinese site admin don't care where the user come from at all. Just like en-us and en-uk, the site admin may only want to provide one English version. So if Django support zh-hant, no matter user come from Hong Kong or Taiwan, they can see Traditional Chinese as wish. The same applies to "zh-hans", no matter the user come from China or Singapore, they both see Simplified Chinese as wish.

comment:7 by Aymeric Augustin, 12 years ago

Easy pickings: unset
Triage Stage: UnreviewedAccepted

I'm going to accept this ticket, but we need a good plan to address backwards compatibility before this can move forward.

You could start a discussion on the django-i18n mailing list to get more input.

comment:8 by Kit Sunde, 12 years ago

Cc: kitsunde@… added

comment:9 by Jannis Leidel, 11 years ago

We can do it the same way we did with the two Norwegian translations, have both translations (zh-Hant/-Hans and zh-TW/-CN) in core and deprecate the latter after two releases.

comment:10 by Bouke Haarsma, 10 years ago

Cc: Bouke Haarsma added
Has patch: set
Owner: changed from nobody to Bouke Haarsma
Status: newassigned

I've created a pull request for this ticket: https://github.com/django/django/pull/1868, based on the strategy proposed by jezdez. So for Django 1.7 and 1.8 there are duplicate chinese translations present; the old ones being deprecated. However when researching I've also found that some browsers send the old (deprecated) language codes, including the latest Firefox. So I've included a check such that visitors with those browsers also get the correct display language, despite incorrect Accept-Language headers.

Note that the transifex languages should also be updated/renamed, although I'm not sure how that works. @jezdez?

Last edited 10 years ago by Bouke Haarsma (previous) (diff)

comment:11 by Aymeric Augustin, 10 years ago

The pull request looks good, but although you sent it a few hours ago it doesn't apply properly. Did you work from an up-to-date copy of master?

If you bring it up to date, you can mark the ticket as RFC.

comment:12 by Aymeric Augustin, 10 years ago

Fixed in c0a2388a1c4ead1afaec98e4ebc953a772ca3849. (Commit referenced #18149 by mistake.)

comment:13 by Bouke Haarsma, 10 years ago

Woops, my bad!

comment:14 by Baptiste Mispelon, 10 years ago

Resolution: fixed
Status: assignedclosed

comment:15 by Bouke Haarsma, 10 years ago

Resolution: fixed
Status: closednew

After discussing on IRC we found that (1) some unit tests failed after the patch and (2) the upgrade path to be somewhat bumpy. The problem occurs when LANGUAGES isn't overridden and would contain both old and new language codes. Now if a browser comes along and requests the old language code (which might include the most recent versions of IE and Firefox), Django will upgrade this old language code to the new language code. Now, if there is some customization done for the old language codes, Django would thus use the new language code and ignored the translations. To mitigate get_supported_language_variant also needs to check if the old language code is still in the list of supported languages.

PR: https://github.com/django/django/pull/1872

comment:16 by Baptiste Mispelon, 10 years ago

Cc: Baptiste Mispelon added
Needs tests: set
Version: 1.4master

The PR does fix the broken tests.

I think we should also add a test for the situation that broke the two tests.
From what I understand, that means when zh-cn is in settings.LANGUAGES.

comment:17 by Bouke Haarsma, 10 years ago

Needs tests: unset

PR updated with added tests.

comment:18 by Baptiste Mispelon <bmispelon@…>, 10 years ago

Resolution: fixed
Status: newclosed

In e5e044da87800feb6ef63fef1765d8c05022d926:

Fixed #18419 -- Full backwards compatibility for old language codes

Improved documentation about zh-* deprecation and upgrade path.

Thanks to Baptiste Mispelon for the code reviews.

Note: See TracTickets for help on using tickets.
Back to Top