Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#25720 closed Bug (fixed)

1.8.x regression: Py2 gettext() returns unicode for bytestring input

Reported by: Marti Raudsepp Owned by: nobody
Component: Internationalization Version: 1.8
Severity: Release blocker Keywords: regression
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

See pull request for the fix: https://github.com/django/django/pull/5615

In Django 1.7 and earlier, the django.utils.translation.gettext() function always returned UTF-8 bytestrings for bytestring inputs. In Django 1.8, this behavior changed: when a translation is active, it now returns unicode objects instead. (when translations are not activated, it returns bytestrings).

This does not appear to be an intentional change, there is nothing about this in release notes or documentation. Probably an unintended side-effect of the refactoring in commit a5f6cbce07b5f3ab48d931e3fd1883c757fb9b45:

I narrowed it down to the removal of this line in __init__:

self.set_output_charset('utf-8')

After re-adding the line it works as expected again (tests included). I don't understand gettext enough to know why the addition in the refactor commit gettext_module.translation(..., codeset='utf-8', ...) doesn't have the same effect.

This fix was sponsored by voicecom.ee.

Change History (9)

comment:1 Changed 5 years ago by Tim Graham

Triage Stage: UnreviewedReady for checkin
Type: UncategorizedBug

Looks good pending some cosmetic comments. Claude, could you check it too?

comment:2 Changed 5 years ago by Claude Paroz

As for me, I don't understand why we keep this obsolete method at all :-/

comment:3 Changed 5 years ago by Tim Graham

Do you mean the gettext() function or something else? (sorry for complete lack of knowledge here!)

comment:4 Changed 5 years ago by Claude Paroz

Yes, I don't see the need of non-unicode translations. But as it will die with Python 2 anyway, I don't want to fight about this.

comment:5 Changed 5 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In d3e3703a:

Fixed #25720 -- Made gettext() return bytestring on Python 2 if input is bytestring.

This is consistent with the behavior of Django 1.7.x and earlier.

comment:6 Changed 5 years ago by Tim Graham <timograham@…>

In 9cdfdbdd:

[1.8.x] Fixed #25720 -- Made gettext() return bytestring on Python 2 if input is bytestring.

This is consistent with the behavior of Django 1.7.x and earlier.

Backport of d3e3703a15cd9d294406121bc43be0c75b1a4e0e from master

comment:7 Changed 5 years ago by Tim Graham <timograham@…>

In 1eed16b9:

[1.9.x] Fixed #25720 -- Made gettext() return bytestring on Python 2 if input is bytestring.

This is consistent with the behavior of Django 1.7.x and earlier.

Backport of d3e3703a15cd9d294406121bc43be0c75b1a4e0e from master

comment:8 Changed 5 years ago by Marti Raudsepp

@claudep Some native Python 2 modules still don't get along with Unicode strings, such as the csv module, which broke our app along with this regression.

Version 0, edited 5 years ago by Marti Raudsepp (next)

comment:9 Changed 5 years ago by Claude Paroz

Thanks for the use case example. I know the csv limitation in Python2, but when I had to cope with it, I ensured to encode the strings before passing them to the module. Anyway, I don't want to argue about removal now.

Note: See TracTickets for help on using tickets.
Back to Top