#31570 closed Bug (fixed)
Translations of one language in different territories can override each other
Reported by: | Shai Berger | Owned by: | Carlton Gibson |
---|---|---|---|
Component: | Internationalization | Version: | 2.2 |
Severity: | Release blocker | Keywords: | |
Cc: | Claude Paroz | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The fix of #30439 created a new problem: Under some circumstances, translations for one language in different territories can override each other. In the case we've run into, users in New Zealand (en-NZ
) received translations for South Africa (en-ZA
). We've seen this behavior on 2.2.12, and downgrading to 2.2.11 resolved it, hence marking this as a bug in 2.2, but I believe the same issue exists in 3.0 and master.
The circumstances where this happens are not trivial to reproduce; it involves having applications with translations for the "bare" language (en
in our case), and may depend on the order of loading of different translations. I will try to come up with a test later.
The issue arises, AFAICT, when the TranslationCatalog
installs a reference to the bare-language dictionary ("catalog" in gettext terms) as the first in the list of catalogs for a territorial variation (when an app has only the bare-language translation), and is then willing to update it with the contents of another territorial variant.
The following diff seems to resolve the issue, but I am far from certain that it is the right solution.
-
django/utils/translation/trans_real.py
diff --git a/django/utils/translation/trans_real.py b/django/utils/translation/trans_real.py index eed4705f6e..8042f6fdc4 100644
a b class TranslationCatalog: 96 96 cat.update(trans._catalog) 97 97 break 98 98 else: 99 self._catalogs.insert(0, trans._catalog )99 self._catalogs.insert(0, trans._catalog.copy()) 100 100 self._plurals.insert(0, trans.plural) 101 101 102 102 def get(self, key, default=None):
Attachments (1)
Change History (22)
comment:2 by , 4 years ago
I see that the problem description wasn't clear enough:
We are working on a website for the K-6 range of schools and students. The "K" year (for "Kindergarten") is called "Year 1" in New Zealand, but "Grade R" in South Africa, and some Kiwis were pretty confused when they saw "Grade R" on the site.
comment:3 by , 4 years ago
Hey Shai. Thanks for the clarification.
I was assuming that en-NZ
and en-ZA
have the same number of plural forms? (Is that correct?)
If so, we shouldn't end up in the branch where your suggested/possible edit is. i.e. the catalogs should be merged as they were in the past...
comment:4 by , 4 years ago
The whole TranslationCatalog
class is new in the fix for #30439.
The issue is not about mis-comparing plural forms at all. The issue is incorrect handling, in that code, of the "base language" (en
) catalog. All the en
variants have the same plural forms, of course, and they do compare equal. But the code fails to account for the fact that the en
catalog is loaded for en-NZ
. When the stars align "correctly", it is loaded in such a way that cat
in cat.update
in line 96 is the en
catalog, and it is updated with the NZ
translations when those are loaded, and then with the ZA
translations when those are loaded. It is the same (is
) dict that is then used for all three languages.
comment:5 by , 4 years ago
When the stars align "correctly", it is loaded in such a way that cat in cat.update in line 96 is the en catalog, and it is updated with the NZ translations when those are loaded, and then with the ZA translations when those are loaded. It is the same (is) dict that is then used for all three languages.
OK, it's that bit I haven't pinned down as yet. A reproduce project would be super if you can. Otherwise I'll try again tomorrow.
(I'm not/wasn't seeing how the copy()
in the else branch on ln99 comes into play if the plurals compare equal...)
Thanks for your patience with me. :)
comment:6 by , 4 years ago
The else:
comes into play when the TranslationCatalog
is still empty.
And thanks for your own patience for all my hand-waving with no failing-test code to back it up yet...
comment:7 by , 4 years ago
The else: comes into play when the TranslationCatalog is still empty.
Yes, of course. I think I had a dose of code blindness yesterday. π€―
Nonetheless, I've had another pop at this this morning and can't seem to get it lined up to reproduce the issue.
I can keep trying but a sample project would be a big help.
by , 4 years ago
Attachment: | test-31570.tgz added |
---|
Test project showing a problem. Depends on nothing but Django. manage.py test
passes on 2.2.11 and 3.0.4, fails on 2.2.12, 3.0.[56] and master.
comment:8 by , 4 years ago
Well, I could not make the stars align just right to reproduce the exact behavior I saw in the large project, but there's enough in attachment:test-31570.tgzβ to show that there is a regression.
The project includes three apps -- bare_en
and bare_en2
have their own translations, but only for en
; the translation for trans
(into four English dialects) is done in the project. In "real life", separate apps with their own translations are 3rd-party apps, of course. There's two such apps because I hoped to get an exact reproduction, and when I gave up on that, I just left them there.
I was unable to get the regional dialects to trump each other -- I see en
being used instead of en-nz
and en-ca
, while en-au
survives. I don't yet fully understand this.
comment:10 by , 4 years ago
Triage Stage: | Unreviewed β Accepted |
---|
OK, super regression confirmed. Thank you Shai.
follow-ups: 14 15 comment:11 by , 4 years ago
Carlton has a point with the plural function comparison broken. I cannot even understand why I wrote that :-( If it worked, this bug wouldn't have surface in this use case (as catalogs should be merged as before). Proper comparison of plural equations is not trivial, I fear it's almost impossible to do it perfectly.
comment:12 by , 4 years ago
Still, I understand I did it because it somewhat magically works in many cases (I didn't dig either in C code to understand why, as it goes beyond my abilities).
Another way to empirically compare the plural functions would be to compare their result on numbers from 1 to 101 (or 105, orβ¦). It would not be perfect but might work for our use case with a rather small performance penalty.
comment:13 by , 4 years ago
Something like that:
diff --git a/django/utils/translation/trans_real.py b/django/utils/translation/trans_real.py index eed4705f6e..0dca4230d2 100644 --- a/django/utils/translation/trans_real.py +++ b/django/utils/translation/trans_real.py @@ -92,7 +92,7 @@ class TranslationCatalog: def update(self, trans): # Merge if plural function is the same, else prepend. for cat, plural in zip(self._catalogs, self._plurals): - if trans.plural.__code__ == plural.__code__: + if _plurals_equal(trans.plural, plural): cat.update(trans._catalog) break else: @@ -259,6 +259,17 @@ class DjangoTranslation(gettext_module.GNUTranslations): return tmsg +def _plurals_equal(plur1, plur2): + """ + Compare plural functions by comparing their result on 0 - 101. + Not perfect, but probably sufficient for our use case. + """ + for num in range(102): + if plur1(num) != plur2(num): + return False + return True + + def translation(language): """ Return a translation object in the default 'django' domain.
comment:14 by , 4 years ago
Replying to Claude Paroz:
Carlton has a point with the plural function comparison broken. I cannot even understand why I wrote that :-( If it worked, this bug wouldn't have surface in this use case (as catalogs should be merged as before). Proper comparison of plural equations is not trivial, I fear it's almost impossible to do it perfectly.
As far as I can tell, this is not the issue here. See comment:4.
comment:15 by , 4 years ago
Replying to Claude Paroz:
[...] plural function comparison broken. [...] If it worked, this bug wouldn't have surface in this use case
After further investigation: This claim is technically correct, but is not helpful for solving this bug.
To my surprise, I found that there are actually two different implementations of the plural function in the English translation files. From the apps in the default project template, admin
and auth
have
"Plural-Forms: nplurals=2; plural=(n != 1);\n"
but this line is not found in the en
translations (which exist) for contenttypes
and sessions
, nor in the files initially generated by makemessages
. The __code__
comparison considers the default different from the explicit plural equation, but otherwise works.
Now, given that the default plural function is lambda n: int(n != 1)
, it is indeed the case that if the plural function comparison worked perfectly, all the plural functions involved would have been considered equal, and the problem would not arise.
However, the point of #30439 was exactly the support of files with different plural-functions for the same language. So the effect of the comparison failure is just to allow us to debug with English, a problem which could pop up in another language, where different functions are more common.
As an aside, this was the major missing piece which allowed me to reproduce exactly the behavior I saw in production: Adding the explicit plural forms to my apps' translation files, and making the test run two rounds of translations (as only when the first round ends, all languages are loaded), that is
-
trans/tests.py
diff trans/tests.py~
old new class TestTranslations(SimpleTestCase): 10 10 'en_CA': 'canuck', 11 11 'en': 'local country person', 12 12 } 13 for language, nick in country_people_nicknames.items(): 14 with self.subTest(language=language): 13 for rnd in "12": 14 for language, nick in country_people_nicknames.items(): 15 with self.subTest(language=language, round=rnd): 15 16 activate(language) 16 17 self.assertEqual(_('local country person'), nick)
Allowed me to get as output
====================================================================== FAIL: test_region_translations (trans.tests.TestTranslations) (language='en', round='1') ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/slate/tmp/test_31570/trans/tests.py", line 17, in test_region_translations self.assertEqual(_('local country person'), nick) AssertionError: 'canuck' != 'local country person' - canuck + local country person ====================================================================== FAIL: test_region_translations (trans.tests.TestTranslations) (language='en_NZ', round='2') ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/slate/tmp/test_31570/trans/tests.py", line 17, in test_region_translations self.assertEqual(_('local country person'), nick) AssertionError: 'canuck' != 'kiwi' - canuck + kiwi ====================================================================== FAIL: test_region_translations (trans.tests.TestTranslations) (language='en', round='2') ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/slate/tmp/test_31570/trans/tests.py", line 17, in test_region_translations self.assertEqual(_('local country person'), nick) AssertionError: 'canuck' != 'local country person' - canuck + local country person
That is, the en-nz
(and en
) dialect was overridden by en-ca
.
comment:16 by , 4 years ago
Owner: | changed from | to
---|---|
Status: | new β assigned |
comment:17 by , 4 years ago
Has patch: | set |
---|
I've spent far too long on this and still didn't get 100% to the bottom of it, particularly with the fallbacks here.
I did learn some things though... π
The key is that once we have the app with only the subset of all the supported languages, _add_installed_apps_translations()
ends up getting a translation βwho's `_catalog` is the exact same dictionary as has been previously used. This is what Shai said already. The suggested copy()
resolves that issue.
Thanks for the report Shai, and the reproduce: it's quite particular. :)
comment:18 by , 4 years ago
Triage Stage: | Accepted β Ready for checkin |
---|
Hey Shai. Thanks for the report. π€
Yes, please. I'm struggling to see where the issue is.
I'm wondering if βthe test for the equivalence of the plural forms is robust enough:
Q: What guarantees that the
__code__
object of the βgenerated plural functions evaluates equal or not? (Current status: no idea: it's not as simple as pointer address, but I can't yet find the implementation so... )Update:
Are the relevant sources, but that's not to have followed the flow fully.
endupdate
There are a number of cases in the existing tests where we enter the different plural forms
else
-block, beyond the test added for #30439, so maybe there is a subtle behaviour change here. I'd be very grateful for that test to be able to pin it down.