Opened 10 years ago

Closed 10 years ago

Last modified 6 years ago

#21415 closed Bug (fixed)

Unicode escapes appear verbatim in translated naturaltime strings

Reported by: 676c7473@… Owned by: Claude Paroz
Component: Translations Version: 1.6
Severity: Release blocker Keywords: i18n l10n translation
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In Django 1.6, the Unicode escape \u00a0 "non-breaking space" that was introduced in the translations is doubly-escaped and now appears verbatim in the output of django.contrib.humanize.naturaltime.

What used to be

vor 6 Sekunden

that is the German ("de") translation for "6 seconds ago", now appears as

vor 6\u00a0Sekunden

in templates that use naturaltime.

This affects all django/contrib/humanize/locale/<language>/LC_MESSAGES/django.po files. Not sure if this should be treated as a translation bug, and how?

Change History (24)

comment:1 Changed 10 years ago by Claude Paroz

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

I could reproduce, ungettext doesn't resolve the double backslash (\\u00a0) it receives from the translation catalog. Crap!
Original ticket: #20246.

comment:2 Changed 10 years ago by Claude Paroz

if the \xa0 doesn't work with makemessages/xgettext and \u00a0 doesn't work when retrieving the string, I think the only remaining solution is to include a literal non-breaking space, telling translators to preserve it by the comment and hoping they will do. Even if they replace it with a regular space, it's not a big deal anyway.
Other proposals?

comment:3 Changed 10 years ago by 676c7473@…

Yes, that works. No better ideas here. Then maybe the comment could be more explicit?

#. Translators: There should be a non-breaking space (U+00A0)
#. between number and time unit.
#: templatetags/humanize.py:199
#, python-format
msgid "a second ago"
msgid_plural "%(count)s\\u00a0seconds ago"
msgstr[0] "vor einer Sekunde"
msgstr[1] "vor %(count)s Sekunden"

(Trac seems to convert the non-breaking space to normal space ...)

comment:4 Changed 10 years ago by David

I have created a pull request https://github.com/django/django/pull/1915 with a proposal for an updated "Translators:" message. Is that ok for you?

I also ran makemessages on the master branch, I could attach this to the pull request. But I suspect that needs to be done in Transifex? Let me know if I can assist you with this ticket in any way. Thanks, David.

comment:5 Changed 10 years ago by Claude Paroz

Has patch: set
Patch needs improvement: set

If the \u00a0 sequence cannot be used by translators in the translated string, I'm not in favor of using it in the original string. Then I would formulate the comment to something like Translators: please keep a non-breaking space (U+00A0) between number and time unit.

comment:6 Changed 10 years ago by David

I kept the \u00a0 because PEP 8 talks about keeping string literals in ASCII in the Python standard library. If it's otherwise ok to put any UTF-8 chars (even whitespace) in source files, then that's fine for me too. I'll update the pull request.

comment:7 Changed 10 years ago by David

Actually, wouldn't "count" be a better word here, since "count" is used in the code and in the msgid? please keep a non-breaking space (U+00A0) between count and time unit? I'll try that.

comment:8 Changed 10 years ago by Claude Paroz

My priority is to not confuse translators with cryptic char sequences, PEP 8 comes after... And yes, "count" is probably better.

comment:9 Changed 10 years ago by David

I have updated the pull request. Should I attach the makemessages changes, too?

comment:10 Changed 10 years ago by Claude Paroz

Owner: changed from nobody to Claude Paroz
Patch needs improvement: unset
Status: newassigned

Thanks, I'll take care of it.

comment:11 Changed 10 years ago by Claude Paroz <claude@…>

Resolution: fixed
Status: assignedclosed

In 7e0ebd74c107e3267b1df438ed7f061f8be5cf05:

Fixed #21415 -- Replaced escape sequence by literal non-breaking space

Unfortunately, escape sequences (\x.. or \u....) do not fit well
with the gettext toolchain. Falling back to using literal char,
even if visibility is not ideal.

comment:12 Changed 10 years ago by Claude Paroz <claude@…>

In 1e2bbc3b712d53032a3cf25e77baf3a157de66f0:

[1.6.x] Fixed #21415 -- Replaced escape sequence by literal non-breaking space

Unfortunately, escape sequences (\x.. or \u....) do not fit well
with the gettext toolchain. Falling back to using literal char,
even if visibility is not ideal.

Backport of 7e0ebd74c from master.

comment:13 Changed 10 years ago by Claude Paroz <claude@…>

In 882ee16f68deeb1831b83a407911e720fbd5e9fd:

[1.6.x] Updated humanize translation catalog

Refs #21415

comment:14 Changed 10 years ago by Claude Paroz

Resolution: fixed
Status: closednew

Before the bug can really considered fixed, we have to:

  • Wait for Transifex to update the pot file (max 24h)
  • Update translations (should be possible even for non speaker of target language)
  • Commit updated translations

comment:15 Changed 10 years ago by Claude Paroz

... and add 1.6.1 release note.

comment:16 Changed 10 years ago by Claude Paroz <claude@…>

In e85baa813f2a2c8e565fc68418ff91e84d7d5ec0:

Updated humanize translations and added release note.

Refs #21415.

comment:17 Changed 10 years ago by Claude Paroz

Resolution: fixed
Status: newclosed

Hopefully fixed now.

comment:18 Changed 10 years ago by David

Thanks for that.

I did a quick check, it seems you have overlooked the last four instances in the German translation django/contrib/humanize/locale/de/LC_MESSAGES/django.po, from line 268 downward, they should be non-breaking spaces (so it begins ... :P)

comment:19 Changed 10 years ago by Claude Paroz

Indeed, I did not replace any normal space with non-breaking space, that's the work of translators. What I did is replace any \u00a0 sequence by non-breaking space. Unfortunately, there is also an issue with Transifex which sometimes keeps a translation even when the original changes (but only spaces are changed). Sorry, I cannot fix all problems myself :-P

comment:20 Changed 10 years ago by David

Just so there is no misunderstanding: at the time of writing, django/contrib/humanize/locale/de/LC_MESSAGES/django.po contains 6 messages that need a non-breaking space. Of these, the first 2 (vor ... Sekunden/Minuten) have the non-breaking space, but the last 4 (vor ... Stunden, nach Sekunden/Minuten/Stunden) have a normal space. \u00a0 was replaced with normal space.

That's why I said this looks like an oversight to me.

comment:21 Changed 10 years ago by Claude Paroz

You are probably right, sorry. I will probably update translations once again before 1.6.1, can you fix them (de) on Transifex?

comment:22 Changed 10 years ago by David

No probs. Unfortunately I was not accepted in the Transifex de group, maybe I did something wrong, will have to check again later.

comment:23 Changed 6 years ago by Claude Paroz <claude@…>

In 72026fff:

[1.11.x] Refs #21415 -- Fixed contrib.humanize translations for es_AR

Thanks Blas Castro for the report and the patch.

comment:24 Changed 6 years ago by Claude Paroz <claude@…>

In 61fd2b49:

Refs #21415 -- Fixed contrib.humanize translations for es_AR

Thanks Blas Castro for the report and the patch.
Forward port of 72026fff3963973d607bf0b525a082d20f024406 from stable/1.11.x

Note: See TracTickets for help on using tickets.
Back to Top