Code

Opened 3 months ago

Closed 8 weeks ago

Last modified 6 weeks ago

#21725 closed Bug (fixed)

Javascript translations fail with non-BMP characters

Reported by: nedbatchelder Owned by: MattBlack
Component: Internationalization Version: 1.6
Severity: Normal Keywords: nlsprint14
Cc: eromijn@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

If a translated string includes a non-BMP character (above 0xFFFF), then javascript_catalog in views/i18n.py fails:

Traceback (most recent call last):
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/core/handlers/base.py", line 111, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/views/i18n.py", line 264, in javascript_catalog
    csrc.append("catalog['%s'] = '%s';\n" % (javascript_quote(k), javascript_quote(v)))
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/utils/functional.py", line 176, in wrapper
    return func(*args, **kwargs)
  File "/home/ned/.virtualenvs/edx-platform/local/lib/python2.7/site-packages/django/utils/text.py", line 305, in javascript_quote
    return str(ustring_re.sub(fix, s))
UnicodeEncodeError: 'ascii' codec can't encode character u'\U0001d543' in position 31: ordinal not in range(128)

(this is running 1.4.8, but the javascript_quote code hasn't changed since then.)

It should be possible to fix javascript_quote to turn the character into a surrogate pair.

Attachments (0)

Change History (12)

comment:1 Changed 3 months ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Uncategorized to Bug

comment:2 Changed 2 months ago by MattBlack

  • Owner changed from nobody to MattBlack
  • Status changed from new to assigned

comment:3 Changed 2 months ago by Baptiste Mispelon <bmispelon@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 1c1dffca757b0b6acaf99d893d68847250ab4146:

Fixed #21725 -- Fixed JavaScript quoting encoding.

Thanks to nedbatchelder for the report.

comment:4 Changed 2 months ago by nedbatchelder

Is this really a full fix? Why does this function replace BMP characters with four-digit \uXXXX escapes, but allow non-BMP characters through unchanged? I would have thought that Javascript would need surrogate pairs.

comment:5 Changed 2 months ago by Honza_Kral

  • Resolution fixed deleted
  • Status changed from closed to new

I am getting consistent failures since this was merge in python 2 (not python 3) on mac:

======================================================================
FAIL: test_javascript_quote_unicode (utils_tests.test_text.TestUtilsText)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/honza/work/django/tests/utils_tests/test_text.py", line 162, in test_javascript_quote_unicode
    self.assertEqual(text.javascript_quote(input), output)
AssertionError: u"<script>alert(\\'Hello \\\\xff.\\n Wel\\ud835\\udd43come\\there\\r\\');<\\/scr [truncated]... != u"<script>alert(\\'Hello \\\\xff.\\n Wel\U0001d543come\\there\\r\\');<\\/script> [truncated]...
- <script>alert(\'Hello \\xff.\n Wel\ud835\udd43come\there\r\');<\/script>?                                   ^^^^^^^^^^^^
+ <script>alert(\'Hello \\xff.\n Wel\ud835\udd43come\there\r\');<\/script>?

comment:6 Changed 2 months ago by bmispelon

I did some digging and as it turns out, javascript_quote is not used anymore when doing javascript translation since a506b6981bc48caec30bca3de94d2ac3e6fc1660.

In fact, this function is undocumented, barely tested, and was only used internally for the javascript_catalogue view.

On top of that, it was also completely broken on Python2 if you ever passed it non-ascii input.

I think we should just delete it altogether.

As for the test failure, they pass on my machine both with Python 2 and 3 and our CI server is also in the green. There's a chance it might be related to a wide/narrow build of Python.

comment:7 Changed 8 weeks ago by bmispelon

  • Has patch set

Here's a pull request that adds a regression test to the javascript_catalog suite that makes sure that non-BMP characters in translation files are correctly handled.

It also deprecates javascript_quote altogether (it was undocumented and not used anymore after a506b6981bc48caec30bca3de94d2ac3e6fc1660).

Finally, it also skips the reported failing test on narrow python builds.

comment:9 Changed 8 weeks ago by erikr

  • Cc eromijn@… added
  • Keywords nlsprint14 added
  • Triage Stage changed from Accepted to Ready for checkin

PR 2339 looks fine to me, and tests run (although I tested on a non-wide python).

comment:10 Changed 8 weeks ago by Baptiste Mispelon <bmispelon@…>

In 926e18d7d126fcf7f4b2d25ce4155423ac6e2f90:

Deprecated django.utils.text.javascript_quote.

Refs #21725.

comment:11 Changed 8 weeks ago by bmispelon

  • Resolution set to fixed
  • Status changed from new to closed

Thanks for the review.

comment:12 Changed 6 weeks ago by Claude Paroz <claude@…>

In ac699cdc174a825e6b78c6f3c6e967bc961413c8:

Really hidden warnings in javascript_quote tests

Refs #21725.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.