Code

Opened 2 years ago

Closed 8 months ago

#18419 closed Bug (fixed)

Language code is not correct for Chinese

Reported by: Olli Wang <olliwang@…> Owned by: bouke
Component: Internationalization Version: master
Severity: Normal Keywords: i18n, chinese, zh
Cc: kitsunde@…, bouke, bmispelon Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently Django uses zh_TW for Traditional Chinese and zh_CN for Simplified Chinese. This should work fine, but may not in the correct way.

This is because Traditional Chinese is not used only by Taiwan (tw) but also Honk Kong (hk), and Simplified Chinese is not only used in China (cn) but also Singapore (sg) and Malaysia. A new standard is to use "zh_Hant" for Traditional Chinese and "zh_Hans" for Simplified Chinese to eliminate the confusion of the content is not just for people lived in China or Taiwan.

Attachments (0)

Change History (18)

comment:1 Changed 2 years ago by Olli Wang <olliwang@…>

  • Easy pickings set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 follow-up: Changed 2 years ago by ramiro

Can you point us to an URL to [a document describing] such new standard?

comment:3 in reply to: ↑ 2 Changed 2 years ago by Olli Wang <olliwang@…>

Replying to ramiro:

Can you point us to an URL to [a document describing] such new standard?

Apple has updated their developer guide to reflect the change. You may want to visit the guide at https://developer.apple.com/library/mac/#documentation/MacOSX/Conceptual/BPInternational/Articles/LanguageDesignations.html and search "hant" or "hans".

comment:4 Changed 2 years ago by Olli Wang <olliwang@…>

After I dig into the source code of Django. I found it may have some issues to adapt the new standard. The first is the conversion between local and language code. Locale is in the format of ll_CC, so for example, the "zh_TW" locale can be converted to the "zh-tw" language code. But as the new standard, language code for Traditional Chinese is "zh-Hant", which is not possible to converted to the locale in ll_CC format because "Hant" is not a country.

The second issue is country-specific codes should fallback to new standard if not available. For example, currently most browsers only use "zh-tw", "zh-cn", "zh-sg", "zh-hk" in the "HTTP_ACCEPT_LANGUAGE" field. In such situation, Django should convert "zh-tw" and "zh-hk" to "zh-Hant" and convert "zh-cn" and "zh-sg" to "zh-Hans". This should be done only if the request one ("zh-tw", "zh-hk", "zh-cn", "zh-sg") is not set in the django.conf.LANGUAGES setting.

The last part is Django should merge these translations. For example, if a browser request a "zh-tw" language. Django should merge both "zh-tw" and "zh-Hant" translation files.

Chinese is my mother language. Please free feel to ask me questions if you are still confusing.

comment:5 Changed 2 years ago by Olli Wang <olliwang@…>

There is another good reason why Django should adopt "zh-Hant" for Traditional Chinese and "zh-Hans" for Simplified Chinese. Imaging there is a site visitor come from Hong Kong. The "HTTP_ACCEPT_LANGUAGE" field sent by his browser may probably only include zh-hk but no zh-tw. In such situation, even though both zh-hk and zh-tw return Traditional Chinese, the Hong Kong user won't see the Traditional Chinese content because Django see zh-hk doesn't match to zh-tw, and fallback to the default language.

Of course some site administrator would still want to target Hong Kong or Taiwan specific users. In such situation, he can still provide both zh-tw and zh-hk translations. But most of the time, a Chinese site admin don't care where the user come from at all. Just like en-us and en-uk, the site admin may only want to provide one English version. So if Django support zh-hant, no matter user come from Hong Kong or Taiwan, they can see Traditional Chinese as wish. The same applies to "zh-hans", no matter the user come from China or Singapore, they both see Simplified Chinese as wish.

comment:7 Changed 2 years ago by aaugustin

  • Easy pickings unset
  • Triage Stage changed from Unreviewed to Accepted

I'm going to accept this ticket, but we need a good plan to address backwards compatibility before this can move forward.

You could start a discussion on the django-i18n mailing list to get more input.

comment:8 Changed 23 months ago by kitsunde

  • Cc kitsunde@… added

comment:9 Changed 17 months ago by jezdez

We can do it the same way we did with the two Norwegian translations, have both translations (zh-Hant/-Hans and zh-TW/-CN) in core and deprecate the latter after two releases.

comment:10 Changed 8 months ago by bouke

  • Cc bouke added
  • Has patch set
  • Owner changed from nobody to bouke
  • Status changed from new to assigned

I've created a pull request for this ticket: https://github.com/django/django/pull/1868, based on the strategy proposed by jezdez. So for Django 1.7 and 1.8 there are duplicate chinese translations present; the old ones being deprecated. However when researching I've also found that some browsers send the old (deprecated) language codes, including the latest Firefox. So I've included a check such that visitors with those browsers also get the correct display language, despite incorrect Accept-Language headers.

Note that the transifex languages should also be updated/renamed, although I'm not sure how that works. @jezdez?

Last edited 8 months ago by bouke (previous) (diff)

comment:11 Changed 8 months ago by aaugustin

The pull request looks good, but although you sent it a few hours ago it doesn't apply properly. Did you work from an up-to-date copy of master?

If you bring it up to date, you can mark the ticket as RFC.

comment:12 Changed 8 months ago by aaugustin

Fixed in c0a2388a1c4ead1afaec98e4ebc953a772ca3849. (Commit referenced #18149 by mistake.)

comment:13 Changed 8 months ago by bouke

Woops, my bad!

comment:14 Changed 8 months ago by bmispelon

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:15 Changed 8 months ago by bouke

  • Resolution fixed deleted
  • Status changed from closed to new

After discussing on IRC we found that (1) some unit tests failed after the patch and (2) the upgrade path to be somewhat bumpy. The problem occurs when LANGUAGES isn't overridden and would contain both old and new language codes. Now if a browser comes along and requests the old language code (which might include the most recent versions of IE and Firefox), Django will upgrade this old language code to the new language code. Now, if there is some customization done for the old language codes, Django would thus use the new language code and ignored the translations. To mitigate get_supported_language_variant also needs to check if the old language code is still in the list of supported languages.

PR: https://github.com/django/django/pull/1872

comment:16 Changed 8 months ago by bmispelon

  • Cc bmispelon added
  • Needs tests set
  • Version changed from 1.4 to master

The PR does fix the broken tests.

I think we should also add a test for the situation that broke the two tests.
From what I understand, that means when zh-cn is in settings.LANGUAGES.

comment:17 Changed 8 months ago by bouke

  • Needs tests unset

PR updated with added tests.

comment:18 Changed 8 months ago by Baptiste Mispelon <bmispelon@…>

  • Resolution set to fixed
  • Status changed from new to closed

In e5e044da87800feb6ef63fef1765d8c05022d926:

Fixed #18419 -- Full backwards compatibility for old language codes

Improved documentation about zh-* deprecation and upgrade path.

Thanks to Baptiste Mispelon for the code reviews.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.