#21725 closed Bug (fixed)
Javascript translations fail with non-BMP characters
Reported by: | Ned Batchelder | Owned by: | MattBlack |
---|---|---|---|
Component: | Internationalization | Version: | 1.6 |
Severity: | Normal | Keywords: | nlsprint14 |
Cc: | eromijn@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
If a translated string includes a non-BMP character (above 0xFFFF), then javascript_catalog in views/i18n.py fails:
Traceback (most recent call last): File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/views/i18n.py", line 264, in javascript_catalog csrc.append("catalog['%s'] = '%s';\n" % (javascript_quote(k), javascript_quote(v))) File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/utils/functional.py", line 176, in wrapper return func(*args, **kwargs) File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/utils/text.py", line 305, in javascript_quote return str(ustring_re.sub(fix, s)) UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001d543' in position 31: ordinal not in range(128)
(this is running 1.4.8, but the javascript_quote code hasn't changed since then.)
It should be possible to fix javascript_quote to turn the character into a surrogate pair.
Change History (13)
comment:1 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Bug |
comment:2 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:4 by , 11 years ago
Is this really a full fix? Why does this function replace BMP characters with four-digit \uXXXX escapes, but allow non-BMP characters through unchanged? I would have thought that Javascript would need surrogate pairs.
comment:5 by , 11 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
I am getting consistent failures since this was merge in python 2 (not python 3) on mac:
====================================================================== FAIL: test_javascript_quote_unicode (utils_tests.test_text.TestUtilsText) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/honza/work/django/tests/utils_tests/test_text.py", line 162, in test_javascript_quote_unicode self.assertEqual(text.javascript_quote(input), output) AssertionError: u"<script>alert(\\'Hello \\\\xff.\\n Wel\\ud835\\udd43come\\there\\r\\');<\\/scr [truncated]... != u"<script>alert(\\'Hello \\\\xff.\\n Wel\U0001d543come\\there\\r\\');<\\/script> [truncated]... - <script>alert(\'Hello \\xff.\n Wel\ud835\udd43come\there\r\');<\/script>? ^^^^^^^^^^^^ + <script>alert(\'Hello \\xff.\n Wel\ud835\udd43come\there\r\');<\/script>?
comment:6 by , 11 years ago
I did some digging and as it turns out, javascript_quote
is not used anymore when doing javascript translation since a506b6981bc48caec30bca3de94d2ac3e6fc1660.
In fact, this function is undocumented, barely tested, and was only used internally for the javascript_catalogue
view.
On top of that, it was also completely broken on Python2 if you ever passed it non-ascii input.
I think we should just delete it altogether.
As for the test failure, they pass on my machine both with Python 2 and 3 and our CI server is also in the green. There's a chance it might be related to a wide/narrow build of Python.
comment:7 by , 11 years ago
Has patch: | set |
---|
Here's a pull request that adds a regression test to the javascript_catalog
suite that makes sure that non-BMP characters in translation files are correctly handled.
It also deprecates javascript_quote
altogether (it was undocumented and not used anymore after a506b6981bc48caec30bca3de94d2ac3e6fc1660).
Finally, it also skips the reported failing test on narrow python builds.
comment:9 by , 11 years ago
Cc: | added |
---|---|
Keywords: | nlsprint14 added |
Triage Stage: | Accepted → Ready for checkin |
PR 2339 looks fine to me, and tests run (although I tested on a non-wide python).
In 1c1dffca757b0b6acaf99d893d68847250ab4146: