Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#25720 closed Bug (fixed)

1.8.x regression: Py2 gettext() returns unicode for bytestring input

Reported by: Marti Raudsepp Owned by: nobody
Component: Internationalization Version: 1.8
Severity: Release blocker Keywords: regression
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

See pull request for the fix: https://github.com/django/django/pull/5615

In Django 1.7 and earlier, the django.utils.translation.gettext() function always returned UTF-8 bytestrings for bytestring inputs. In Django 1.8, this behavior changed: when a translation is active, it now returns unicode objects instead. (when translations are not activated, it returns bytestrings).

This does not appear to be an intentional change, there is nothing about this in release notes or documentation. Probably an unintended side-effect of the refactoring in commit a5f6cbce07b5f3ab48d931e3fd1883c757fb9b45:

I narrowed it down to the removal of this line in __init__:

self.set_output_charset('utf-8')

After re-adding the line it works as expected again (tests included). I don't understand gettext enough to know why the addition in the refactor commit gettext_module.translation(..., codeset='utf-8', ...) doesn't have the same effect.

This fix was sponsored by voicecom.ee.

Change History (9)

comment:1 by Tim Graham, 9 years ago

Triage Stage: UnreviewedReady for checkin
Type: UncategorizedBug

Looks good pending some cosmetic comments. Claude, could you check it too?

comment:2 by Claude Paroz, 9 years ago

As for me, I don't understand why we keep this obsolete method at all :-/

comment:3 by Tim Graham, 9 years ago

Do you mean the gettext() function or something else? (sorry for complete lack of knowledge here!)

comment:4 by Claude Paroz, 9 years ago

Yes, I don't see the need of non-unicode translations. But as it will die with Python 2 anyway, I don't want to fight about this.

comment:5 by Tim Graham <timograham@…>, 9 years ago

Resolution: fixed
Status: newclosed

In d3e3703a:

Fixed #25720 -- Made gettext() return bytestring on Python 2 if input is bytestring.

This is consistent with the behavior of Django 1.7.x and earlier.

comment:6 by Tim Graham <timograham@…>, 9 years ago

In 9cdfdbdd:

[1.8.x] Fixed #25720 -- Made gettext() return bytestring on Python 2 if input is bytestring.

This is consistent with the behavior of Django 1.7.x and earlier.

Backport of d3e3703a15cd9d294406121bc43be0c75b1a4e0e from master

comment:7 by Tim Graham <timograham@…>, 9 years ago

In 1eed16b9:

[1.9.x] Fixed #25720 -- Made gettext() return bytestring on Python 2 if input is bytestring.

This is consistent with the behavior of Django 1.7.x and earlier.

Backport of d3e3703a15cd9d294406121bc43be0c75b1a4e0e from master

comment:8 by Marti Raudsepp, 9 years ago

@claudep Some native Python 2 modules still don't get along with Unicode strings, such as the csv module, which broke our app along with this regression.

Version 0, edited 9 years ago by Marti Raudsepp (next)

comment:9 by Claude Paroz, 9 years ago

Thanks for the use case example. I know the csv limitation in Python2, but when I had to cope with it, I ensured to encode the strings before passing them to the module. Anyway, I don't want to argue about removal now.

Note: See TracTickets for help on using tickets.
Back to Top