Opened 2 years ago

Closed 19 months ago

#21389 closed Bug (fixed)

Some languages not accepted as valid HTTP Accept-Language

Reported by: bouke Owned by: bouke
Component: Internationalization Version: master
Severity: Normal Keywords:
Cc: bouke@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


In get_language_from_request there are these lines:

        normalized = locale.locale_alias.get(to_locale(accept_lang, True))
        if not normalized:

However, fy_nl is not in `locale_alias` (use search). Hence, the fy/fy_nl language code is rejected.

Attachments (1)

failing-fy-test.diff (474 bytes) - added by bouke 2 years ago.

Download all attachments as: .zip

Change History (9)

Changed 2 years ago by bouke


comment:1 Changed 2 years ago by bouke

  • Cc bouke@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 2 years ago by claudep

  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Uncategorized to Bug

The fact that get_language_from_request relies on locale.locale_alias is too limiting. The problem is that locale.locale_alias is using the (on my system) Xlib /usr/share/X11/locale/locale.alias list, which is not as exhaustive as the libc /usr/share/i18n/SUPPORTED. So we have to fix this, but we have to be careful because this was introduced in a security fix [842a771e0527c].

comment:3 Changed 2 years ago by bouke

The security fix referred to is a DoS-related attack:

A per-process cache used by Django's internationalization ("i18n") system to store the results of translation lookups for particular values of the HTTP Accept-Language header used the full value of that header as a key. An attacker could take advantage of this by sending repeated requests with extremely large strings in the Accept-Language header, potentially causing a denial of service by filling available memory.

Due to limitations imposed by Web server software on the size of HTTP header fields, combined with reasonable limits on the number of requests which may be handled by a single server process over its lifetime, this vulnerability may be difficult to exploit. Additionally, it is only present when the "USE_I18N" setting in Django is "True" and the i18n middleware component is enabled*. Nonetheless, all users of affected versions of Django are encouraged to update.

Comparing the set of locales defined in globalsettings.LANGUAGES and Pythons' locale_alias, there are more locales suffering from this issue. The percentages show the amount of strings translated into that language.

    ('fy-nl', 'Frisian'),               #  2%
    ('ia', 'Interlingua'),              # 81%
    ('kk', 'Kazakh'),                   # 74%
    ('lb', 'Luxembourgish'),            # 25%
    ('mn', 'Mongolian'),                # 89%
    ('my', 'Burmese'),                  # 24%
    ('ne', 'Nepali'),                   # 78%
    ('os', 'Ossetic'),                  # 91%
    ('sr-latn', 'Serbian Latin'),       # 78%
    ('sw', 'Swahili'),                  # 82%
    ('udm', 'Udmurt'),                  # 30%
    ('zh-hans', 'Simplified Chinese'),  # 92%
    ('zh-hant', 'Traditional Chinese'), # 90%

HTTP Accept-Language follow the language tags defined by IANA and are defined as:

 langtag       = language
                 ["-" script]
                 ["-" region]
                 *("-" variant)
                 *("-" extension)
                 ["-" privateuse]

According to IANA, there are 8068 languages, 162 scripts, 301 regions and 62 variants. Shipping an inclusive list of all possible language tags is probably overkill.
So there should be some limitation on which languages can be provided by the user to prevent such an attack, while allowing all possible languages and sublanguages.

Last edited 23 months ago by bouke (previous) (diff)

comment:4 Changed 23 months ago by bouke

  • Summary changed from Frisian is not accepted as HTTP Accept-Language to Some languages not accepted as valid HTT Accept-Language

comment:5 Changed 23 months ago by bouke

  • Summary changed from Some languages not accepted as valid HTT Accept-Language to Some languages not accepted as valid HTTP Accept-Language

comment:6 Changed 23 months ago by bouke

  • Owner changed from nobody to bouke
  • Status changed from new to assigned

comment:8 Changed 19 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 2bab9d6d9ea095c4bcaeede2df576708afd46291:

Fixed #21389 -- Accept most valid language codes

By removing the 'supported' keyword from the detection methods and only relying
on a cached settings.LANGUAGES, the speed of said methods has been improved;
around 4x raw performance. This allows us to stop checking Python's incomplete
list of locales, and rely on a less restrictive regular expression for
accepting certain locales.

HTTP Accept-Language is defined as being case-insensitive, based on this fact
extra performance improvements have been made; it wouldn't make sense to
check for case differences.

Note: See TracTickets for help on using tickets.
Back to Top