Opened 3 years ago

Closed 3 years ago

#21389 closed Bug (fixed)

Some languages not accepted as valid HTTP Accept-Language

Reported by: Bouke Haarsma Owned by: Bouke Haarsma
Component: Internationalization Version: master
Severity: Normal Keywords:
Cc: bouke@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


In get_language_from_request there are these lines:

        normalized = locale.locale_alias.get(to_locale(accept_lang, True))
        if not normalized:

However, fy_nl is not in `locale_alias` (use search). Hence, the fy/fy_nl language code is rejected.

Attachments (1)

failing-fy-test.diff (474 bytes) - added by Bouke Haarsma 3 years ago.

Download all attachments as: .zip

Change History (9)

Changed 3 years ago by Bouke Haarsma

Attachment: failing-fy-test.diff added


comment:1 Changed 3 years ago by Bouke Haarsma

Cc: bouke@… added
Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

comment:2 Changed 3 years ago by Claude Paroz

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

The fact that get_language_from_request relies on locale.locale_alias is too limiting. The problem is that locale.locale_alias is using the (on my system) Xlib /usr/share/X11/locale/locale.alias list, which is not as exhaustive as the libc /usr/share/i18n/SUPPORTED. So we have to fix this, but we have to be careful because this was introduced in a security fix [842a771e0527c].

comment:3 Changed 3 years ago by Bouke Haarsma

The security fix referred to is a DoS-related attack:

A per-process cache used by Django's internationalization ("i18n") system to store the results of translation lookups for particular values of the HTTP Accept-Language header used the full value of that header as a key. An attacker could take advantage of this by sending repeated requests with extremely large strings in the Accept-Language header, potentially causing a denial of service by filling available memory.

Due to limitations imposed by Web server software on the size of HTTP header fields, combined with reasonable limits on the number of requests which may be handled by a single server process over its lifetime, this vulnerability may be difficult to exploit. Additionally, it is only present when the "USE_I18N" setting in Django is "True" and the i18n middleware component is enabled*. Nonetheless, all users of affected versions of Django are encouraged to update.

Comparing the set of locales defined in globalsettings.LANGUAGES and Pythons' locale_alias, there are more locales suffering from this issue. The percentages show the amount of strings translated into that language.

    ('fy-nl', 'Frisian'),               #  2%
    ('ia', 'Interlingua'),              # 81%
    ('kk', 'Kazakh'),                   # 74%
    ('lb', 'Luxembourgish'),            # 25%
    ('mn', 'Mongolian'),                # 89%
    ('my', 'Burmese'),                  # 24%
    ('ne', 'Nepali'),                   # 78%
    ('os', 'Ossetic'),                  # 91%
    ('sr-latn', 'Serbian Latin'),       # 78%
    ('sw', 'Swahili'),                  # 82%
    ('udm', 'Udmurt'),                  # 30%
    ('zh-hans', 'Simplified Chinese'),  # 92%
    ('zh-hant', 'Traditional Chinese'), # 90%

HTTP Accept-Language follow the language tags defined by IANA and are defined as:

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

According to IANA, there are 8068 languages, 162 scripts, 301 regions and 62 variants. Shipping an inclusive list of all possible language tags is probably overkill.
So there should be some limitation on which languages can be provided by the user to prevent such an attack, while allowing all possible languages and sublanguages.

Last edited 3 years ago by Bouke Haarsma (previous) (diff)

comment:4 Changed 3 years ago by Bouke Haarsma

Summary: Frisian is not accepted as HTTP Accept-LanguageSome languages not accepted as valid HTT Accept-Language

comment:5 Changed 3 years ago by Bouke Haarsma

Summary: Some languages not accepted as valid HTT Accept-LanguageSome languages not accepted as valid HTTP Accept-Language

comment:6 Changed 3 years ago by Bouke Haarsma

Owner: changed from nobody to Bouke Haarsma
Status: newassigned

comment:7 Changed 3 years ago by Bouke Haarsma

Has patch: set

comment:8 Changed 3 years ago by Claude Paroz <claude@…>

Resolution: fixed
Status: assignedclosed

In 2bab9d6d9ea095c4bcaeede2df576708afd46291:

Fixed #21389 -- Accept most valid language codes

By removing the 'supported' keyword from the detection methods and only relying
on a cached settings.LANGUAGES, the speed of said methods has been improved;
around 4x raw performance. This allows us to stop checking Python's incomplete
list of locales, and rely on a less restrictive regular expression for
accepting certain locales.

HTTP Accept-Language is defined as being case-insensitive, based on this fact
extra performance improvements have been made; it wouldn't make sense to
check for case differences.

Note: See TracTickets for help on using tickets.
Back to Top