Code

Opened 5 years ago

Closed 5 years ago

Last modified 3 years ago

#10335 closed (fixed)

tzinfo.py should use getdefaultencoding instead of getdefaultlocale[1]

Reported by: gthb Owned by: mitsuhiko
Component: Core (Other) Version: 1.0
Severity: Keywords: locale encoding getdefaultlocale getdefaultencoding mac
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

In some locales on Mac OS X, page rendering fails with an error like:

 File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/template/debug.py", line 81, in render_node
   raise wrapped

TemplateSyntaxError: Caught an exception while rendering: unknown encoding: mac-icelandic

(where the encoding varies, could be mac-roman or other Mac-specific encodings)

It turns out that Python does not recognize some Mac-specific encodings, including those that are default in some locales (such as our Icelandic one).

That should be okay, we would just configure Python or Django to use a different default encoding. But the trouble is, we can't. In tzinfo.py the default encoding is determined as locale.getdefaultlocale()[1], which calls _locale._getdefaultlocale(), which on a Mac calls CFStringGetSystemEncoding. Of this latter function, the Mac documentation says:

In Mac OS X, this encoding is determined by the user's preferred language setting. The preferred language is the first language listed in the International pane of the System Preferences.

So there appears to be no way, in some locales, to get Django to render pages. One is forced to change the user's global language preference in the operating system settings. That's a bit too inflexible.

In contrast, sys.getdefaultencoding is easily controlled by calling sys.setdefaultencoding in sitecustomize.py.

Thus tzinfo.py should obtain the default encoding using sys.getdefaultencoding instead — at least optionally, or perhaps as a special-case exception for the mac-specific encodings, if one wants to take great care to minimize behavior changes in working installations.

Attachments (4)

tzinfo-getdefaultencoding.patch (565 bytes) - added by gthb 5 years ago.
Patch for simplest possible change, getdefaultencoding() instead of getdefaultlocale()[1]
tzinfo-getdefaultencoding-only-if-unchanged-fails.patch (679 bytes) - added by gthb 5 years ago.
Another patch, more conservative. This one changes nothing unless the getdefaultlocale()[1] encoding is unknown to Python.
tzinfo-check-encoding-supported-by-python.diff (727 bytes) - added by gthb 5 years ago.
Still more conservative patch. This one just checks that the encoding is known to Python (and slightly amends a comment)
10335-tzname-locale.patch (2.1 KB) - added by mitsuhiko 5 years ago.
improved fix that moved the encoding constant to the encodings module because it can be used in more than one location, fixed a bug in repr

Download all attachments as: .zip

Change History (15)

Changed 5 years ago by gthb

Patch for simplest possible change, getdefaultencoding() instead of getdefaultlocale()[1]

comment:1 Changed 5 years ago by gthb

  • Keywords getdefaultencoding mac added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I attached a patch against trunk for the simplest change, just using sys.getdefaultencoding() instead of locale.getdefaultlocale()[1]. I ran the test suite before and after. Before, I got hordes of errors almost all of which included the words "unknown encoding: mac-icelandic". After, I got only this:

======================================================================
FAIL: Doctest: regressiontests.model_regress.models.__test__.API_TESTS
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/gthb/Library/Python/2.5/site-packages/django/test/_doctest.py", line 2180, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for regressiontests.model_regress.models.__test__.API_TESTS
  File "/Users/gthb/extsvn/django-trunk/tests/regressiontests/model_regress/models.py", line unknown line number, in API_TESTS

----------------------------------------------------------------------
File "/Users/gthb/extsvn/django-trunk/tests/regressiontests/model_regress/models.py", line ?, in regressiontests.model_regress.models.__test__.API_TESTS
Failed example:
    BrokenUnicodeMethod.objects.all()
Expected:
    [<BrokenUnicodeMethod: [Bad Unicode data]>]
Got:
    [<BrokenUnicodeMethod: Názov: Jerry>]

... which I also got when running the test suite on unchanged django trunk.

comment:2 Changed 5 years ago by gthb

Just found this, for what it's worth, in http://bugs.python.org/issue813449 :

This is a known limitation: getdefaultlocale should not be used in new code.

If the intention is to compute the locale's encoding, locale.getpreferredencoding should be used instead.

Still, no such warning is given in http://www.python.org/doc/2.5/lib/module-locale.html

In any case, locale.getpreferredencoding also just returns _locale._getdefaultlocale()[1] on darwin or mac, so switching to that will not resolve this issue.

Changed 5 years ago by gthb

Another patch, more conservative. This one changes nothing unless the getdefaultlocale()[1] encoding is unknown to Python.

comment:3 Changed 5 years ago by jacob

  • milestone set to 1.1
  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 5 years ago by mtredinnick

I'm nervous about this. sys.setdefaultencoding should really never be called. It has too many dangerous side effects and is only in the language by accident (long history; search python-dev archives for details if you really care).

A better solution would be preferred. I'm not convinced that Django's behaviour is actually incorrect here: it asks for the system locale and is given the right answer (since the user's system is set up to return their preference). The problem is that we don't necessarily understand that answer. Maybe that's the level it needs to be addressed on.

comment:5 Changed 5 years ago by gthb

Sorry for the late reply, I hadn't set up an email address for notifications.

In response to mtredinnick: good point about setdefaultencoding, I didn't know that. A few notes:

  1. The above patch (second one) only calls getdefaultencoding, and it does so only if the present implementation fails in a way that will break Django. The setdefaultencoding call would be made consciously by the user as a workaround, only upon encountering this problem, and they will only encounter it if ascii is not good enough.
  1. Based on a quick grep, this DEFAULT_ENCODING is used in only one place in all of Django, in this same file:
                return smart_unicode(time.tzname[self._isdst(dt)], DEFAULT_ENCODING)
    

Standard timezone names are probably all representable in ascii anyway, so maybe ascii is good enough here. I suppose DEFAULT_ENCODING (or timezones) must have been used in more places when I ran into this some weeks ago, because Django broke right away, but now it breaks only when I add something involving timezones. I now reproduce it with a view that does this:

from django.utils.tzinfo import LocalTimezone
from datetime import datetime
LocalTimezone(datetime.now())

So maybe it is enough to settle for ascii, when the default locale method fails or returns a bogus encoding.

  1. I will attach a third, even more conservative patch, that just adds the codec lookup within the first try block. The trouble with Mac OS X in the Icelandic locale is that locale.getdefaultlocale()[1] does not raise an exception but instead just returns an encoding name that will result in an exception later when used. This most minimal patch just makes the failure come up right away, so that the except block deals with it. This way the comment "Any problems at all determining the locale and we fallback" is made more true, with no other effects.
  1. On correctness: the premise "since the user's system is set up to return their preference" assumes too much. My locale preference for the overall OS is not necessarily my locale preference for Django. In the case of Mac OS X I don't get to separate the two because of the hardcoding explained above. And if my preference is a locale with a Mac-specific encoding not recognized by Python (such as the Icelandic locale), then that triggers a serious problem in django.utils.tzinfo. The problem is not that Django doesn't understand the answer, but that Python itself doesn't. This problem can be worked around with one of the options I propose here.

Changed 5 years ago by gthb

Still more conservative patch. This one just checks that the encoding is known to Python (and slightly amends a comment)

comment:6 Changed 5 years ago by mitsuhiko

  • Owner changed from nobody to mitsuhiko

Changed 5 years ago by mitsuhiko

improved fix that moved the encoding constant to the encodings module because it can be used in more than one location, fixed a bug in repr

comment:7 Changed 5 years ago by mitsuhiko

I attached a new patch that is basically the latest one but the constant moved to the encodings module (this is reusable in multiple places). Also if unicode data is returned the repr method will no longer return a unicode string.

comment:8 Changed 5 years ago by jacob

  • Triage Stage changed from Accepted to Ready for checkin

comment:9 Changed 5 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [10703]) Fixed #10335: handle system locals unknown to Python in timezone name handling. Thanks, mitsuhiko.

comment:10 Changed 5 years ago by jacob

(In [10704]) [1.0.X] Fixed #10335: handle system locals unknown to Python in timezone name handling. Thanks, mitsuhiko. Backport of [10703] from trunk.

comment:11 Changed 3 years ago by jacob

  • milestone 1.1 deleted

Milestone 1.1 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.