Opened 4 years ago

Last modified 19 months ago

#16284 assigned Cleanup/optimization

djangojs uses en as fallback language rather than projects language code

Reported by: anonymous Owned by: KJ
Component: Internationalization Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

In views/i18n.py there is code:

    # first load all english languages files for defaults
    for package in packages:
        p = importlib.import_module(package)
        path = os.path.join(os.path.dirname(p.__file__), 'locale')
        paths.append(path)
        try:
            catalog = gettext_module.translation(domain, path, ['en'])
            t.update(catalog._catalog)
        except IOError:
            pass
        else:
            # 'en' is the selected language and at least one of the packages
            # listed in `packages` has an 'en' catalog
            if en_selected:
                en_catalog_missing = False

I believe django should first check if LANGUAGE_CODE setting is not set and use this for defaults. Our main language is polish and while we have english translations available, we don't want django to fallback to english.

Change History (10)

comment:1 Changed 4 years ago by aaugustin

  • Component changed from Uncategorized to Internationalization
  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to aaugustin
  • Patch needs improvement unset
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Design decision needed
  • Type changed from Uncategorized to Cleanup/optimization

Indeed, it's weird to have a special case for English.

This dates back to r1560. Unfortunately, the commit message is not explicit, and it doesn't reference a ticket.

That code was touched several times recently (r14901, r15404, r15441). #13726 is a good entry point, but I still don't understand very well what's going on there.

I'll mark this ticket as DDN, because I think we need to check if the special case is useful, or if it's just an historic artifact.


While investigating this ticket, I noticed there is another explicit reference to English as the "untranslated language" in django.core.management.base.execute(). The comment dates back to #5898, which was a refactoring, so it may be even older. It would be nice to check what's happening there too, and add an helpful comment for future developers :)

comment:2 Changed 4 years ago by aaugustin

  • Owner changed from aaugustin to nobody
  • Status changed from assigned to new

comment:3 Changed 3 years ago by jeroen.pulles@…

This piece of code also causes an annoying bug, as I see it:

I have a project with nl as default language, the source strings are in en_US and there are en_GB translations. As a result this piece of code has "en_selected" but no catalog named "en" (it's named en_GB, obviously); Meaning it ends up with an empty translations dictionary for no good reason.

To my mind, this should be as simple as got language code, got catalog with that language code, i'm done.

comment:4 Changed 3 years ago by ramiro

See (in that order and with their associated commits): #3594, #13388, #13726 for the rationale of the current behavior and to help you research of if it's possible to remove the special treatment given to 'en' in the JS i18n code. See also this comment: https://code.djangoproject.com/ticket/13726#comment:3

Last edited 3 years ago by ramiro (previous) (diff)

comment:5 Changed 2 years ago by aaugustin

  • Triage Stage changed from Design decision needed to Accepted

comment:6 follow-up: Changed 2 years ago by KJ

  • Has patch set
  • Owner changed from nobody to KJ
  • Status changed from new to assigned
  • Version changed from 1.3 to master

I’ve created a patch: https://github.com/django/django/pull/1338 . There are two things it removes:

  1. Loading of English translation. As there is a priori no reason to treat English differently than other languages, this is a no-brainer.
  2. Loading of language specified in LANGUAGE_CODE setting. This has to be done because there is no way to determine what language we’re translating from. For example, if we have source texts written in Polish and LANGUAGE_CODE is set to “es”, we want to get no translations for users with language set to “pl”. If Spanish translation was present, this would be impossible with current solution.

The patch makes the javascript_catalog function simpler and more consistent with gettext behaviour.

The present behaviour had one test which I had to remove. Fortunately it doesn’t seem to be documented, so probably we don’t have to worry that much about backwards compatibility (although a small release note may be worth considering).

comment:7 in reply to: ↑ 6 ; follow-up: Changed 2 years ago by ramiro

Replying to KJ:

  1. Loading of English translation. As there is a priori no reason to treat English differently than other languages, this is a no-brainer.

We can discuss about this. I think it might be possible to give no special treatment to English in JS i18n because it's not tied to the GNU gettext runtime translation machinery (through the Python gettext stdlib module) that is the one known to have some en-us preference hardcoded behavior.
Replying to KJ:

  1. Loading of language specified in LANGUAGE_CODE setting. This has to be done because there is no way to determine what language we’re translating from. For example, if we have source texts written in Polish and LANGUAGE_CODE is set to “es”, we want to get no translations for users with language set to “pl”. If Spanish translation was present, this would be impossible with current solution.

Well, that would change the behavior. By definition LANGUAGE_CODE is the language to which runtime translation fallbacks when no translation is available for a given literal in the user-preferred language. Also, it's the way to dictate which global translation to use when no per-user preference language detection mechanism is in effect (localmiddleware). As you've well described this LANGUAGE_CODE can be different from the language of the Python/JS source code files/templates.

We need to find a way to cover your use case (fallback to source code language without taking in account LANGUAGE_CODE) but without breaking existing use and deployments.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 2 years ago by KJ

  • Patch needs improvement set

Replying to ramiro:

Replying to KJ:

  1. Loading of English translation. As there is a priori no reason to treat English differently than other languages, this is a no-brainer.

We can discuss about this. I think it might be possible to give no special treatment to English in JS i18n because it's not tied to the GNU gettext runtime translation machinery (through the Python gettext stdlib module) that is the one known to have some en-us preference hardcoded behavior.

I didn’t know about that. I have done some googling and grepping in gettext’s source code for that en-us-centric behavior and found nothing, which may mean that it is not a feature that gettext is very proud of.

Replying to KJ:

  1. Loading of language specified in LANGUAGE_CODE setting. This has to be done because there is no way to determine what language we’re translating from. For example, if we have source texts written in Polish and LANGUAGE_CODE is set to “es”, we want to get no translations for users with language set to “pl”. If Spanish translation was present, this would be impossible with current solution.

Well, that would change the behavior. By definition LANGUAGE_CODE is the language to which runtime translation fallbacks when no translation is available for a given literal in the user-preferred language. Also, it's the way to dictate which global translation to use when no per-user preference language detection mechanism is in effect (localmiddleware). As you've well described this LANGUAGE_CODE can be different from the language of the Python/JS source code files/templates.

We need to find a way to cover your use case (fallback to source code language without taking in account LANGUAGE_CODE) but without breaking existing use and deployments.

One way would be to provide more flexible way of specifying language preference (like a setting which points to a function which returns an iterable of language codes, in order of preference). However this would still leave some potential holes, for example if:

  • App A is written in English and translated into Polish.
  • App B is written in Polish and translated into English.

, then I see no way to make a satisfactory global language preference order. In this case it would be better to introduce a setting which allows programmer to specify “source code natural language”, but this setting should be assigned per app, which is currently not possible in an elegant way. So I propose to cut my pull request to only removing English-centric behavior for now, as the second problem doesn’t seem to be an easy one to handle.

comment:9 in reply to: ↑ 8 Changed 2 years ago by ramiro

Replying to KJ:

Replying to ramiro:

Replying to KJ:

  1. Loading of English translation. As there is a priori no reason to treat English differently than other languages, this is a no-brainer.

We can discuss about this. I think it might be possible to give no special treatment to English in JS i18n because it's not tied to the GNU gettext runtime translation machinery (through the Python gettext stdlib module) that is the one known to have some en-us preference hardcoded behavior.

I didn’t know about that. I have done some googling and grepping in gettext’s source code for that en-us-centric behavior and found nothing, which may mean that it is not a feature that gettext is very proud of.

I'll need to search for this myself too. I rememeber reading about it... And here it is, it was in this same issue tracker: #8626

comment:10 Changed 19 months ago by KJ

I’ve started a thread on django-developers. The thread contains solution proposal.

Note: See TracTickets for help on using tickets.
Back to Top