Opened 14 months ago

Closed 4 months ago

Last modified 3 weeks ago

#30439 closed Bug (fixed)

Translations issues on Django upgrade due to unexpected changes in plural forms

Reported by: Michal Čihař Owned by: Claude Paroz
Component: Internationalization Version: 2.2
Severity: Normal Keywords:
Cc: Maciej Olko Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When locales have different plural forms, the ngettext can be easily broken because it uses plural equation from the first loaded gettext catalog for all of them.

For example in Czech locale, there are different plural equations being used even inside Django with either 3 or 4 plural forms. When the one with 4 forms is loaded first, Django is looking for non existing plural in catalogs.

Reproducer in Django 2.2.1:

>>> from django.utils import translation
>>> translation.activate('cs')
>>> translation.ngettext('This password is too short. It must contain at least %(min_length)d character.', 'This password is too short. It must contain at least %(min_length)d characters.', 1)
'Heslo je příliš krátké. Musí mít délku aspoň %(min_length)d znak.'
>>> translation.ngettext('This password is too short. It must contain at least %(min_length)d character.', 'This password is too short. It must contain at least %(min_length)d characters.', 10)
'This password is too short. It must contain at least %(min_length)d characters.'

The second invocation is trying to find 4th plural in 3 plurals catalog.

Change History (51)

comment:1 Changed 14 months ago by Claude Paroz

Owner: changed from nobody to Claude Paroz
Status: newassigned
Triage Stage: UnreviewedAccepted

I'm partly responsible of this situation, because I'm not always merging translation files when only the plural string changes (essentially to save some .mo files which are the main cause of Django repository growing size). We are unfortunately completely dependent on Transifex about these plural string changes, which are not always desired by translation teams.

Whatever, we should definitely commit all files with plural string changes.

comment:2 Changed 14 months ago by Claude Paroz

Let's do that soon before the 2.2.2 release, so we can get new translations in the same commit.

comment:3 Changed 14 months ago by Michal Čihař

The very same situation happens with third party app localization (this is where I noticed it first). I think it's not reasonable to expect everything will use same plural forms.

comment:4 Changed 14 months ago by Claude Paroz

Then do you have a suggestion for an alternate implementation?

comment:5 Changed 14 months ago by Michal Čihař

It would be necessary to avoid merging of the catalogs and perform lookup over several catalogs. With merging, the lookup in the catalog can not respect plural equation defined in the file. With Czech language I believe this is issue only because of a bug in the plural form (see below), but there are language were several different plural forms are widely used (for example Latvian, where they have same number of plurals, but reverse ordering of the strings in one of them).

Going back to Czech locale, the plural equation for Czech (which you apparently got from Transifex) doesn't make sense at all, because it defines 4 plural forms, but it never evaluates to 2 because n % 1 is always 0:

(n == 1 && n % 1 == 0) ? 0 : (n >= 2 && n <= 4 && n % 1 == 0) ? 1: (n % 1 != 0 ) ? 2 : 3

So the correct fix here would be revert to original formula and blame Transifex to fix this. Probably some damage was done to the plurals translation meanwhile...

PS: I've seen such messed up plurals already in several projects which have decided to migrate from Transifex to Weblate (which I develop), not sure if anybody has reported this to them.

comment:6 Changed 14 months ago by Claude Paroz

It would be necessary to avoid merging of the catalogs and perform lookup over several catalogs. With merging, the lookup in the catalog can not respect plural equation defined in the file.

Exactly, are you up to try some proof of concept patch?

So the correct fix here would be revert to original formula and blame Transifex to fix this.

Unfortunately, it's very difficult to make Transifex team move on these subjects. We've been bitten by this with Polish in the past.

comment:7 Changed 14 months ago by Michal Čihař

I was thinking about something like this:

  class DjangoTranslation:
    def __init__(language, domain='django', localedirs=None):
        self.domain = domain
        self.translations = []
        # Here would be rest of init logic based on current code

    def merge(self, other):
        if not isinstance(other, NullTranslations):
            self.translations.append(other)
            
    def gettext(message):
        for translation in self.translations:
            tmsg = translation._catalog.get(message, None)
            if tmsg is not None:
                return tmsg
        return message

It would perform slightly worse than current code especially in case there will be many apps with locales.

PS: If you think such approach would be acceptable I can try to find time to implement something along this in week or two.

Last edited 14 months ago by Michal Čihař (previous) (diff)

comment:8 Changed 14 months ago by Claude Paroz

Sacrificing performance is never rejoicing, but I'm not sure we have the choice here. And the cost shouldn't be too high hopefully.

comment:9 Changed 11 months ago by Helder Correia

Strings from django.contrib.postgres are not being translated. Is this the same issue?

>>> from django.utils.translation import activate, gettext as _
>>> activate('pt')
>>> _('The start of the range must not exceed the end of the range.')  # django/contrib/postgres/forms/ranges.py:20
'The start of the range must not exceed the end of the range.'
>>> _('This field cannot be null.')  # django/db/models/fields/__init__.py:105
'Este campo não pode ser nulo.'

comment:10 Changed 11 months ago by Claude Paroz

No, this is unrelated. You probably don't have django.contrib.postgres in your INSTALLED_APPS.

comment:11 Changed 8 months ago by אורי

Hi, I was referred to this ticket from https://code.djangoproject.com/ticket/30939 as a duplicate. I don't understand why you need such a long formula to decide if a number is singular or plural. If it's 1 it's singular, otherwise it's plural. What's wrong about that?

Last edited 8 months ago by אורי (previous) (diff)

comment:12 Changed 8 months ago by Maciej Olko

I don't understand why you need such a long formula to decide if a number is singular or plural. If it's 1 it's singular, otherwise it's plural. What's wrong about that?

Some languages use e.g. different forms of nouns in phrases depending on number value.
For example "Delete X objects" in Polish is "Usuń 2 obiekty" and "Usuń 5 obiektów".
Those formulas address such cases.

comment:13 Changed 8 months ago by Rodrigo

Owner: changed from Claude Paroz to Rodrigo

As Claude solicited in the mailing list, I will work on this :)

comment:14 Changed 8 months ago by Rodrigo

I'm still getting acquainted with the code and gettext, but my initial impression is that the code is fine - JIC, the snippet provided in the bug report by Michal is working correctly in the current master branch, I guess the plural equation in the .po has been fixed and the .mo.

All languages are kept in the _translations dict, with an instance of DjangoTranslation per lang. When you call DjangoTranslation(language) it will merge all the catalogs (.po) available for the language, setting the plural with the first one but this plural is only used if there is no catalog for the lang.

If the plural comes in each catalog, why isn't reasonable to expect that they would be the same? If on catalog of the same language has 3, but other 4, one should be bad - one thing that could be done is checking that on merging and if it changes, then raise an error.

About merging the .mos, why we just distribute the .po files and put a system check that will warn that i18n is on and django-admin compilemessages hasn't been run?

comment:15 Changed 8 months ago by Michal Čihař

This got "fixed" by more Czech PO files using the wrong plural equation. Even having all equations in Django consistent this will break with third-party module localization - unless they also use Transifex which does mess up things same way, their localization will be incomplete.

Even ruling out this Transifex bug, it's not sane to assume that plural formula would be same — for example in Hebrew, in most cases it's okay to stick with two plural forms, while in some complex cases you need use four of them.

PS: Sorry for not looking again into this, I got distracted by other things and never found time to dig deeper into this.

comment:16 in reply to:  15 ; Changed 8 months ago by Rodrigo

Replying to Michal Čihař:

This got "fixed" by more Czech PO files using the wrong plural equation. Even having all equations in Django consistent this will break with third-party module localization - unless they also use Transifex which does mess up things same way, their localization will be incomplete.

I think there should be one plural equation in Django and it should be in django.conf.locales.LC.formats.PLURAL_FORMS, maintained independently from a platform, just like any other language specific format.

Then functions like copy_plural_forms should get it from there, and when syncin', either overwrite it or generate an empty one and do a msgmerge.

Even ruling out this Transifex bug, it's not sane to assume that plural formula would be same — for example in Hebrew, in most cases it's okay to stick with two plural forms, while in some complex cases you need use four of them.

While keepin' diggin', I wasn't convinced of the title of the ticket, because ngettext seems not to break, because it seems intended to work only with one plural form - and then I found in https://docs.djangoproject.com/en/2.2/topics/i18n/translation/#pluralization:

Note
Plural form and po files
Django does not support custom plural equations in po files. As all translation catalogs are merged, only the plural form for the main Django po file (in django/conf/locale/<lang_code>/LC_MESSAGES/django.po) is considered. Plural forms in all other po files are ignored. Therefore, you should not use different plural equations in your project or application po files.

that makes it explicit.

If you generate and compile the messages using the django-admin commands, there would be no problem, I think the case you are referring is when you put "pos and mos" from external sources - like a translation platform, which for convenience may use less plurals than the corresponding or specified for the languange - in locale dirs which have a different plural forms than the one coming with Django, which is not supported.

Then the ticket would be "Add support for variable number of plural forms in Django in order to support different translations platforms outputs", do you agree?

A way of implementing it is instead of merging the catalogs, keep them on a list and then return the first one available in that list - as you proposed.

I still don't understand why Django has the wrong approach, because a language should have only one number of plurals and one plural equation. Once it is right, if you provide less than specified, then it's incomplete.

If you are using a translation platform and you want to add convenience to your users so your front-end to accept less entries if deemed not necessary, why not make your output to fill the empty ones with the supplied and comply with the formal number for the language? I think it would be better for all the consumers of your "pos and mos" :)

If you are editing the your pos manually and some plurals are deemed not necessary for the case, just copy and paste to fill in the gaps.

If so, we can add a to the documentation something like "If you are using third-party translations, ensure that their output matches the defined number of plurals in the plural forms [(specified in ...LC.formats)] for the language in Django or some translations will be missed".

Do you think I am missing/misunderstanding something?

PS: Sorry for not looking again into this, I got distracted by other things and never found time to dig deeper into this.

Please, thank you for giving your time and energy for making Django better :)

comment:17 Changed 8 months ago by Rodrigo

Michal,

A PR to address the problem - translations may get messed up on Django upgrade due to unexpected changes in plural forms - and make this more convenient is on its way.

In the meantime and JIC, if you want to manually fix your translations' plural forms, you can edit the plural form in the main Django po file for the language (located in django/conf/locale/<lang_code>/LC_MESSAGES/django.po) and then run msgfmt django.po -o django.mo in that directory - that will do it.

PS: I kept reading my past comment and there may be an misinterpretation of my intention, if that is the case, I apologize for not finding the right words

comment:18 Changed 8 months ago by Rodrigo

Summary: ngettext broken for certain locales due to catalog mergingTranslations issues on Django upgrade due to unexpected changes in plural forms

Alright, it's going to take more time that I thought, here is my implementation plan:

  • Have 2 more options in compilemessages: --main-po and --initial

--main-po is for addressing easily plural form changes in Django, you edit the plural form and then re-compile with it.

--initial is for compiling all the pos distributed with Django.

  • Have a system check for i18n turned on and no mos available
  • Have a collectlocale command that does the same as collectstatic but for locale

This is for addressing Claude's concern about repository size and not merging changes so often. There is no need - IMO - to keep under VC the mo files and distribute them when the can be easily generated by the user. There is duplication between the po and the mo, they have the same information but in different format.

If mos are out of the repository, then Claude could merge easily any po changes and the distribution size will lower also. The only cost of this would be run "django-admin compilemessages --initial" on any new project that will include i18n. It could also be run automatically if the system check fails, I'm not sure where would this go.

The need of a collectlocale command arises from the problem that any changes on the pos in the Django distribution will be overridden on upgrade. If you want to persist them across upgrades, you would need to take them the project directory. Then, if a settings.LOCALE_ROOT exists, use that as a starting point for the catalog merging. This would be the more complex to implement, and it would render the --main-po useless.

I have already implemented the --main-po option and started with the collectlocale as a better option, has anyone have any thoughts on this?

Here is the branch where I'll be working until the final PR is reached

comment:19 Changed 8 months ago by Claude Paroz

Sorry Rodrigo, but this ticket isn't about distributing or not .mo files. This is #23321, and it should be discussed on the dev mailing list before any code writing.

The main coding direction of this ticket currently is to not merge translation catalogs at runtime if plural form count is different.

comment:20 in reply to:  19 ; Changed 8 months ago by Rodrigo

Replying to Claude Paroz:

Sorry Rodrigo, but this ticket isn't about distributing or not .mo files. This is #23321, and it should be discussed on the dev mailing list before any code writing.

Alright, I didn't know about #23321, it has the same direction that I was thinking :) I will send a message to the mailing list

The main coding direction of this ticket currently is to not merge translation catalogs at runtime if plural form count is different.

Raise an error or log a warning and ignore the catalog?

comment:21 in reply to:  20 ; Changed 8 months ago by Claude Paroz

Replying to Rodrigo:

The main coding direction of this ticket currently is to not merge translation catalogs at runtime if plural form count is different.

Raise an error or log a warning and ignore the catalog?

The idea is to keep several translation catalogs per language (one per plural form?), and successively try to get a translation from each catalog. This is described in comments above, please re-read them.

comment:22 in reply to:  16 ; Changed 8 months ago by Michal Čihař

Replying to Rodrigo:

Then functions like copy_plural_forms should get it from there, and when syncin', either overwrite it or generate an empty one and do a msgmerge.

Overwriting plural form will mess up existing translations.

If you generate and compile the messages using the django-admin commands, there would be no problem, I think the case you are referring is when you put "pos and mos" from external sources - like a translation platform, which for convenience may use less plurals than the corresponding or specified for the languange - in locale dirs which have a different plural forms than the one coming with Django, which is not supported.

The current mess was started by Transifex, which is AFAIK Django official localization platform. I don't think they support honoring plural forms existing in the file, they always rewrite to whatever they have hardcoded.

Then the ticket would be "Add support for variable number of plural forms in Django in order to support different translations platforms outputs", do you agree?

True, I didn't notice this is covered in the docs. However, in such case Django should not unadvertised change plural forms in existing translations.

I still don't understand why Django has the wrong approach, because a language should have only one number of plurals and one plural equation. Once it is right, if you provide less than specified, then it's incomplete.

I noticed it on Czech where Django has wrong plural form (see above, it can never evaluate to 2, so even if it defines 4 plurals, one of them is never used). Still, there might be valid reason to have different plural forms and equations in different scope.

I do have quite some insight into this because I develop Weblate (using Django), where we put quite some effort to handle plural forms correctly :-).

The earlier mentioned example of Hebrew can be seen on our public hosting service - 25% of translations use simple two plurals and 75% use more complex four plurals. Both are correct and in many cases the one with two plural forms is working well at it makes it easier for translators.

Another example where things would be messed up with current Django implementation is Lithuanian, see https://github.com/WeblateOrg/weblate/issues/901 for discussion we had on this at Weblate.

comment:23 in reply to:  21 Changed 8 months ago by Rodrigo

Replying to Claude Paroz:

Replying to Rodrigo:

The main coding direction of this ticket currently is to not merge translation catalogs at runtime if plural form count is different.

Raise an error or log a warning and ignore the catalog?

The idea is to keep several translation catalogs per language (one per plural form?), and successively try to get a translation from each catalog. This is described in comments above, please re-read them.

Alright, but this is not inline with django supporting only one plural form as stated in docs.

If so, the ticket should be "Multiple plural forms support for i18n" ("Add support for variable number of plural forms in Django in order to support different translations platforms outputs")

comment:24 in reply to:  22 Changed 8 months ago by Rodrigo

Replying to Michal Čihař:

Replying to Rodrigo:

Then functions like copy_plural_forms should get it from there, and when syncin', either overwrite it or generate an empty one and do a msgmerge.

Overwriting plural form will mess up existing translations.

For what I understood, this is only used when a new .po file is created by makemessages, if the file exists it will do a msgmerge and I think the plural form will be retained (though it will be overridden in the "general" merge of the catalogs for the language. This seems reasonable with the "only one plural form" support.

If you generate and compile the messages using the django-admin commands, there would be no problem, I think the case you are referring is when you put "pos and mos" from external sources - like a translation platform, which for convenience may use less plurals than the corresponding or specified for the languange - in locale dirs which have a different plural forms than the one coming with Django, which is not supported.

The current mess was started by Transifex, which is AFAIK Django official localization platform. I don't think they support honoring plural forms existing in the file, they always rewrite to whatever they have hardcoded.

Then the ticket would be "Add support for variable number of plural forms in Django in order to support different translations platforms outputs", do you agree?

True, I didn't notice this is covered in the docs. However, in such case Django should not unadvertised change plural forms in existing translations.

I agree, it should be covered at least in the Release Notes.

I still don't understand why Django has the wrong approach, because a language should have only one number of plurals and one plural equation. Once it is right, if you provide less than specified, then it's incomplete.

I noticed it on Czech where Django has wrong plural form (see above, it can never evaluate to 2, so even if it defines 4 plurals, one of them is never used). Still, there might be valid reason to have different plural forms and equations in different scope.

I do have quite some insight into this because I develop Weblate (using Django), where we put quite some effort to handle plural forms correctly :-).

The earlier mentioned example of Hebrew can be seen on our public hosting service - 25% of translations use simple two plurals and 75% use more complex four plurals. Both are correct and in many cases the one with two plural forms is working well at it makes it easier for translators.

Another example where things would be messed up with current Django implementation is Lithuanian, see https://github.com/WeblateOrg/weblate/issues/901 for discussion we had on this at Weblate.

I will try to synthesize my previous comments.

I'm not saying there may be not valid reasons, I'm saying that I don't see them for what I have understood of the subject so far (I'm new to the domain). Django bases its i18n module on gettext and therefore follows its design. Why gettext does not support multiple pf on merge? I don't know, I only can guess deducing for what I have learn so far. Seems that it's also a gettext choice, every software that uses gettext and merges its catalog will have the same issues.

There are two parts of the plural form, the number of plurals and the plural equation. As you said, the number of plurals may be less for convenience (making the life of the translators easier), which is not bad - don't misunderstand me. For what I have inferred of gettext is that this should be handled at the "front-end level", you allow less forms and generate the expected input of gettext with the total forms, i.e. if it evaluates to 4th form and that is the same as the second, then only 2 forms have to be addressed.

I can be wrong, this is what I have understood so far. If so, there will be issues with any software using gettext with catalog merging (possibly leaving them empty and using the fallback language).

The other part is the plural equation. If different plural equations are needed on different scopes for a language, then this seems a limitation of gettext - but again, I'm not knowledgeable on gettext nor the field.

Supporting multiple plural forms via not merging the catalogs would be a way of not making one part of the translation community work interfere (unintentionally) with another part of it. That's a valid reason to me :)

Handling broken plural equations is another issue :)

The clearer the intention, the easier to get there :)

comment:25 Changed 7 months ago by אורי

Hi,

I think Django 2.2 (which is LTS) should be updated in a way that using these functions to translate will be backward-compatible, so that everything that worked with Django up to 2.1 will keep working in Django 2.2 and above. Currently this is not the case in Hebrew with Speedy Net, which starts displaying error messages in English in the Hebrew websites (Speedy Net & Speedy Match) when I upgrade Django to 2.2, which is not what I expect. In Django up to 2.1 these error messages are translated to Hebrew, which is what I expect.

After encountering this problem I decided to keep Speedy Net in Django 2.1 (which I upgraded from 1.11 about 3 weeks ago) until there is a solution, or at least until I can make Speedy Net work in Hebrew in Django 2.2 and above. However, I understand the end-of-life of Django 2.1 is coming soon, which means we will be using an obsolete version of Django in a production website.

comment:26 Changed 7 months ago by Claude Paroz

Uri, until Django supports po files with different plural forms (which will NOT happen in Django 2.2), you should update your po files to also use 4 plurals like Django in 2.2.

comment:27 in reply to:  3 Changed 7 months ago by אורי

Replying to Michal Čihař:

The very same situation happens with third party app localization (this is where I noticed it first). I think it's not reasonable to expect everything will use same plural forms.

Hi,

I understand the problem better now. Michal is correct, changes in one .po file should not affect other .po files. I don't need more than 2 forms (one singular, one plural) in my translation files and it makes things complicated. Running makemessages in Django 2.2 doesn't change my po files. I can't edit each file manually and add a third and fourth strings, which are identical to the second. I prefer to keep using Django 2.1 until this issue is resolved. I can understand now that in some languages more than 2 forms are required, which may be also the case in Hebrew, but at least I want to keep what worked until now to keep working. If I decide to introduce more plural forms later, let me do it manually in my own .po files and don't force me to rely on changes to the .po files of Django. I don't think it makes sense that one .po file (the first one loaded) is more "important" than others and would force all the other .po files to change because of changes in the first file.

Last edited 7 months ago by אורי (previous) (diff)

comment:28 Changed 7 months ago by אורי

Hi,

Please notice "ngettext_lazy and ngettext" thread on the Django developers group.

comment:29 in reply to:  22 Changed 7 months ago by אורי

Replying to Michal Čihař:

The earlier mentioned example of Hebrew can be seen on our public hosting service - 25% of translations use simple two plurals and 75% use more complex four plurals. Both are correct and in many cases the one with two plural forms is working well at it makes it easier for translators.

Actually I checked Django translations in Hebrew and it seems to me that maybe/almost in 100% of the translations where there are 4 plural forms, the 3 last plural forms are identical. In reality I think in about 80% to 90% of the cases, 2 plural forms are enough. I don't see any reason why to add 2 additional plural forms to our translations (in Speedy Net), if they are identical to the second plural form we already have. But if we must have 4 plural forms in Hebrew, at least I would expect Django to add the 2 additional plural forms in a correct way when I upgrade Django to 2.2.

The best solution for me would be, if I can keep using 2 plural forms for Hebrew on Speedy Net, without relating to how many plural forms Django is using for its own translations.

comment:30 Changed 7 months ago by Michal Čihař

If you don't care about translations coming from Django, you can use workaround similar as we do in Weblate - on merge we replace the plural formula. Django localization is typically loaded first and then your apps, so that makes your app plural formula win. The code for this is here: https://github.com/WeblateOrg/weblate/blob/36eaa21c55f8979e522e991759456b028f7c2912/weblate/utils/django_hacks.py

This is not a proper solution as it just makes it bad for Django localization, but we use very little of those strings (if any), so it is not a problem for Weblate.

comment:31 Changed 7 months ago by Maciej Olko

Cc: Maciej Olko added

comment:32 Changed 6 months ago by Rodrigo

Has patch: set

comment:33 in reply to:  32 ; Changed 6 months ago by אורי

Replying to Rodrigo:

PR

Can you explain how this PR solves this issue? There are so many changed files in this PR so it's difficult for me to understand what has been changed in it.

Which version of Django this PR will be included in if accepted?

comment:34 in reply to:  33 Changed 6 months ago by Rodrigo

Replying to אורי:

Replying to Rodrigo:

PR

Can you explain how this PR solves this issue? There are so many changed files in this PR so it's difficult for me to understand what has been changed in it.

Here is the doc:

The plural form consistency is ensured in two ways:

  • The "not-merge" policy in DjangoTranslation
  • The catalog consistency test will check that any catalog introduced in the base catalogs will have the same plural form as the main one for the lang, or, if a change in the plural form is made, all the catalogs for the lang must be updated (this is done with makemessages --comply-plural-forms). This is done at a CI level.

So, there won't be different plural forms in Django catalogs anymore (the original report by Michal).

Once a change in the main plural forms gets to the user, the user will see this warning.

Then, they have the two options:

  • running "makemessages --comply-plural-forms" to incorporate the new forms to their catalogs, or
  • use the LOCALE_ROOT setting for using their plural forms.

If they go with --comply-plural-forms, as the doc say, "In the case of an increase in the number of plurals, it will fill those with the previously chosen forms so you avoid having translation results in the fallback language or the original string - this will produce the same results as before though you may want to update those later for a better expression of the language if it corresponds." ... "If no plural forms is found in the catalog, the user will be prompted to copy them from the main po file. No remapping will be done, it is assumed that is already aligned. If it is not the case, use a plural form with the number of plurals matching the catalog and then re-run the command to perform form mapping".

This way, any update on Django's plural forms won't break translations in user projects. The user should see the warning once they run the project after the update. If they just update it without checking, they will see "big parts" untranslated (the catalogs not merged) which will lead them to the logs and see the warning, then run --comply-plural-forms.

If they go with the LOCALE_ROOT, they can collect the base catalogs with "makemessages --collect-base-catalogs" (cbc), customize the plural forms and then run "makemessages --comply-plural-forms" to "propagate" the form to the project's catalogs.

Once Django is updated another time, they won't see any warning, it's up to them to call again (cbc) so they get the new msgids incorporated in Django.

With this setup, you will have a way of propagating updates in the plural forms consistently and the option to have your plural forms locally.

Which version of Django this PR will be included in if accepted?

Not sure, is targeted for 3.1

comment:35 Changed 6 months ago by אורי

Thanks for the explanation, Rodrigo. Good work!

comment:36 in reply to:  35 Changed 6 months ago by Rodrigo

Replying to אורי:

Thanks for the explanation, Rodrigo. Good work!

:)

After my explanation, I realized that the doc would be better like this

Any improvements are welcomed :)

comment:37 Changed 6 months ago by Carlton Gibson

Needs documentation: set
Patch needs improvement: set

I've given this an initial look but without a clearer consensus that it's the right approach I'm not sure what to say.
I'll look again but I would like to see positive agreement on the discussion thread (which is already too long...)
before committing to the approach.
That's the main issue.

Then, just on the PR, the docs changes don't explain what's going on clearly. The new settings and makemessages options don't have a Why you'd use them anywhere.
(That discussion is here and in the mailing list, but it needs to be clear for someone installing the new version.)

Once those are in place, release notes.

Finally the PR needs a bit of a tidy, and they'll no doubt be nitpicks but I want to sort out the more fundamental issues before we spend energy there.

comment:38 in reply to:  37 Changed 6 months ago by Rodrigo

Replying to Carlton Gibson:

Hey Carlton, thanks for taking a look

I've given this an initial look but without a clearer consensus that it's the right approach I'm not sure what to say.
I'll look again but I would like to see positive agreement on the discussion thread (which is already too long...)
before committing to the approach.
That's the main issue.

I understand, one of the subjacent problems of the issue is the relation with Transifex.

The proposed fix solves this independently - from Transifex or any translation provider - by having the plural forms consistency "inside" or "by" Django.

Then, just on the PR, the docs changes don't explain what's going on clearly. The new settings and makemessages options don't have a Why you'd use them anywhere.
(That discussion is here and in the mailing list, but it needs to be clear for someone installing the new version.)

No doubt they can be improved, I tried to be as concise as possible in the docs.

Despite of adding some lines of "off-line code", they are quite simple:

  • Changes on the plural forms may lead to breakage of your translated front-end, so a warning is added and a measure is taken to encourage your action
  • If you see this warning, use makemessages --comply-plural-forms to update and align user catalogs with the main plural form so you ensure your translations consistency
  • If above is not sufficient for you, use LOCALE_ROOT and makemessages --collect-base-catalogs to take translation files into your project, customize them to your wish and maintain them in your release cycle.

This is what the main doc tries to communicate.

Once those are in place, release notes.

Indeed, they weren't included

Finally the PR needs a bit of a tidy, and they'll no doubt be nitpicks but I want to sort out the more fundamental issues before we spend energy there.

I tried to write the code in line with the makemessages command style mostly, and although I extensively test it, no code is bug-free and not perfectible

comment:39 Changed 6 months ago by Claude Paroz

At last, I took the time to try implementing support for different plural strings for the same language.

I would appreciate testing from affected languages.

comment:40 in reply to:  39 ; Changed 6 months ago by Rodrigo

Replying to Claude Paroz:

At last, I took the time to try implementing support for different plural strings for the same language.

I would appreciate testing from affected languages.

Your implementation looks good, the problem with "dict-merge" (what you implemented) is that updates in Django won't reach users unless something triggers it externally. Users' catalogs would live under a different key than the updated Django main catalogs, ignoring the updated plural forms.

Plural forms are only generated/copied once the catalog is initialized with makemessages, otherwise the messages will be updated only, see https://github.com/django/django/blob/master/django/core/management/commands/makemessages.py#L616

You shouldn't trigger a plural forms copy on each write_po_file because it may mess with the forms unexpectedly, i.e. if a plural form changes its "meaning" (ordering). Under a plural forms change, the catalog should be reorganized.

comment:41 in reply to:  40 Changed 6 months ago by Michal Čihař

Replying to Claude Paroz:

At last, I took the time to try implementing support for different plural strings for the same language.

I would appreciate testing from affected languages.

Looks good, I will try to find time for testing this.

Replying to Rodrigo:

Your implementation looks good, the problem with "dict-merge" (what you implemented) is that updates in Django won't reach users unless something triggers it externally. Users' catalogs would live under a different key than the updated Django main catalogs, ignoring the updated plural forms.

I'd say this is expected behavior for me. I don't want Django to anything more than msgmerge. Changing plural forms might need adjustment in the translations as well and can't be done safely automatically.

comment:42 Changed 5 months ago by Carlton Gibson

Owner: changed from Rodrigo to Claude Paroz
Patch needs improvement: unset

Hi Claude, thanks for the PR. Pending testing/confirmation from Michal, please uncheck Needs documentation when you're ready.

comment:43 in reply to:  42 Changed 5 months ago by אורי

Replying to Carlton Gibson:

Hi Claude, thanks for the PR. Pending testing/confirmation from Michal, please uncheck Needs documentation when you're ready.

If PR #12332 is accepted, will it be possible for you guys to apply it to Django 2.2 as well? Especially since translations (for example in Hebrew) break / stop working when upgrading Django from 2.1 to 2.2, and it seems that this PR solves it (translations work). Or is it against your policy to apply such a PR to a Django version such as 2.2? I would very much prefer it if you guys apply this PR to Django 2.2, then to have to fork Django and install it from fork in production, where I will have to apply this PR to any future version of Django you will release. Or on the other hand keep using Django 2.1 which is obsolete.

comment:44 Changed 4 months ago by Carlton Gibson

Needs documentation: unset
Triage Stage: AcceptedReady for checkin

comment:45 Changed 4 months ago by GitHub <noreply@…>

Resolution: fixed
Status: assignedclosed

In e3e48b0:

Fixed #30439 -- Added support for different plural forms for a language.

Thanks to Michal Čihař for review.

comment:46 Changed 4 months ago by Carlton Gibson <carlton.gibson@…>

In d9f1792c:

[3.0.x] Fixed #30439 -- Added support for different plural forms for a language.

Thanks to Michal Čihař for review.
Backport of e3e48b00127c09eafe6439d980a82fc5c591b673 from master

comment:47 Changed 4 months ago by Carlton Gibson <carlton.gibson@…>

In 996be04c:

[2.2.x] Fixed #30439 -- Added support for different plural forms for a language.

Thanks to Michal Čihař for review.
Backport of e3e48b00127c09eafe6439d980a82fc5c591b673 from master

comment:48 Changed 4 months ago by אורי

Thank you for accepting this PR and applying it to Django 2.2 and 3.0 too.

comment:49 Changed 3 months ago by אורי

Hi, I just want to say that I upgraded Django to 2.2.12 in Speedy Net and then upgraded again to 3.0.5, and everything works as expected. I didn't change anything in our translation files and they are still using the old plural forms formula like before ("Plural-Forms: nplurals=2; plural=(n != 1);\n").

comment:50 in reply to:  5 ; Changed 3 weeks ago by Michal Hozza

Replying to Michal Čihař:

It would be necessary to avoid merging of the catalogs and perform lookup over several catalogs. With merging, the lookup in the catalog can not respect plural equation defined in the file. With Czech language I believe this is issue only because of a bug in the plural form (see below), but there are language were several different plural forms are widely used (for example Latvian, where they have same number of plurals, but reverse ordering of the strings in one of them).

Going back to Czech locale, the plural equation for Czech (which you apparently got from Transifex) doesn't make sense at all, because it defines 4 plural forms, but it never evaluates to 2 because n % 1 is always 0:

(n == 1 && n % 1 == 0) ? 0 : (n >= 2 && n <= 4 && n % 1 == 0) ? 1: (n % 1 != 0 ) ? 2 : 3

This is actually not true. n%1 != 0 for decimal numbers. I believe Czech has a special plural form for decimal numbers, similarly to Slovak, so the formula is correct.

So the correct fix here would be revert to original formula and blame Transifex to fix this. Probably some damage was done to the plurals translation meanwhile...

PS: I've seen such messed up plurals already in several projects which have decided to migrate from Transifex to Weblate (which I develop), not sure if anybody has reported this to them.

comment:51 in reply to:  50 Changed 3 weeks ago by Michal Čihař

Replying to Michal Hozza:

This is actually not true. n%1 != 0 for decimal numbers. I believe Czech has a special plural form for decimal numbers, similarly to Slovak, so the formula is correct.

Yes, but the Gettext logic is integer only, so it cannot evaluate decimals. So while it's theoretically correct, it doesn't make sense with Gettext where the expression is evaluated as integers because for every integer, n%1 == 0.

This is how it ends up being evaluated:

>>> from gettext import c2py
>>> c2py('n%1')(1.5)
0

But this is really off-topic here, this is just special case where plural rules can be different, there are languages where the differences are intentional depending on scope of the translation and this is now handled properly by Django.

Note: See TracTickets for help on using tickets.
Back to Top