#10335 closed (fixed)
tzinfo.py should use getdefaultencoding instead of getdefaultlocale[1]
Reported by: | Gunnlaugur Þór Briem | Owned by: | Armin Ronacher |
---|---|---|---|
Component: | Core (Other) | Version: | 1.0 |
Severity: | Keywords: | locale encoding getdefaultlocale getdefaultencoding mac | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In some locales on Mac OS X, page rendering fails with an error like:
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/django/template/debug.py", line 81, in render_node raise wrapped TemplateSyntaxError: Caught an exception while rendering: unknown encoding: mac-icelandic
(where the encoding varies, could be mac-roman or other Mac-specific encodings)
It turns out that Python does not recognize some Mac-specific encodings, including those that are default in some locales (such as our Icelandic one).
That should be okay, we would just configure Python or Django to use a different default encoding. But the trouble is, we can't. In tzinfo.py the default encoding is determined as locale.getdefaultlocale()[1], which calls _locale._getdefaultlocale(), which on a Mac calls CFStringGetSystemEncoding. Of this latter function, the Mac documentation says:
In Mac OS X, this encoding is determined by the user's preferred language setting. The preferred language is the first language listed in the International pane of the System Preferences.
So there appears to be no way, in some locales, to get Django to render pages. One is forced to change the user's global language preference in the operating system settings. That's a bit too inflexible.
In contrast, sys.getdefaultencoding is easily controlled by calling sys.setdefaultencoding in sitecustomize.py.
Thus tzinfo.py should obtain the default encoding using sys.getdefaultencoding instead — at least optionally, or perhaps as a special-case exception for the mac-specific encodings, if one wants to take great care to minimize behavior changes in working installations.
Attachments (4)
Change History (15)
by , 16 years ago
Attachment: | tzinfo-getdefaultencoding.patch added |
---|
comment:1 by , 16 years ago
Keywords: | getdefaultencoding mac added |
---|
I attached a patch against trunk for the simplest change, just using sys.getdefaultencoding() instead of locale.getdefaultlocale()[1]. I ran the test suite before and after. Before, I got hordes of errors almost all of which included the words "unknown encoding: mac-icelandic". After, I got only this:
====================================================================== FAIL: Doctest: regressiontests.model_regress.models.__test__.API_TESTS ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/gthb/Library/Python/2.5/site-packages/django/test/_doctest.py", line 2180, in runTest raise self.failureException(self.format_failure(new.getvalue())) AssertionError: Failed doctest test for regressiontests.model_regress.models.__test__.API_TESTS File "/Users/gthb/extsvn/django-trunk/tests/regressiontests/model_regress/models.py", line unknown line number, in API_TESTS ---------------------------------------------------------------------- File "/Users/gthb/extsvn/django-trunk/tests/regressiontests/model_regress/models.py", line ?, in regressiontests.model_regress.models.__test__.API_TESTS Failed example: BrokenUnicodeMethod.objects.all() Expected: [<BrokenUnicodeMethod: [Bad Unicode data]>] Got: [<BrokenUnicodeMethod: Názov: Jerry>]
... which I also got when running the test suite on unchanged django trunk.
comment:2 by , 16 years ago
Just found this, for what it's worth, in http://bugs.python.org/issue813449 :
This is a known limitation: getdefaultlocale should not be used in new code.
If the intention is to compute the locale's encoding, locale.getpreferredencoding should be used instead.
Still, no such warning is given in http://www.python.org/doc/2.5/lib/module-locale.html
In any case, locale.getpreferredencoding also just returns _locale._getdefaultlocale()[1] on darwin or mac, so switching to that will not resolve this issue.
by , 16 years ago
Attachment: | tzinfo-getdefaultencoding-only-if-unchanged-fails.patch added |
---|
Another patch, more conservative. This one changes nothing unless the getdefaultlocale()[1] encoding is unknown to Python.
comment:3 by , 16 years ago
milestone: | → 1.1 |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:4 by , 16 years ago
I'm nervous about this. sys.setdefaultencoding
should really never be called. It has too many dangerous side effects and is only in the language by accident (long history; search python-dev archives for details if you really care).
A better solution would be preferred. I'm not convinced that Django's behaviour is actually incorrect here: it asks for the system locale and is given the right answer (since the user's system is set up to return their preference). The problem is that we don't necessarily understand that answer. Maybe that's the level it needs to be addressed on.
comment:5 by , 16 years ago
Sorry for the late reply, I hadn't set up an email address for notifications.
In response to mtredinnick: good point about setdefaultencoding
, I didn't know that. A few notes:
- The above patch (second one) only calls
getdefaultencoding
, and it does so only if the present implementation fails in a way that will break Django. Thesetdefaultencoding
call would be made consciously by the user as a workaround, only upon encountering this problem, and they will only encounter it ifascii
is not good enough.
- Based on a quick grep, this
DEFAULT_ENCODING
is used in only one place in all of Django, in this same file:return smart_unicode(time.tzname[self._isdst(dt)], DEFAULT_ENCODING)
Standard timezone names are probably all representable in ascii
anyway, so maybe ascii
is good enough here. I suppose DEFAULT_ENCODING
(or timezones) must have been used in more places when I ran into this some weeks ago, because Django broke right away, but now it breaks only when I add something involving timezones. I now reproduce it with a view that does this:
from django.utils.tzinfo import LocalTimezone from datetime import datetime LocalTimezone(datetime.now())
So maybe it is enough to settle for ascii
, when the default locale method fails or returns a bogus encoding.
- I will attach a third, even more conservative patch, that just adds the codec lookup within the first try block. The trouble with Mac OS X in the Icelandic locale is that
locale.getdefaultlocale()[1]
does not raise an exception but instead just returns an encoding name that will result in an exception later when used. This most minimal patch just makes the failure come up right away, so that theexcept
block deals with it. This way the comment "Any problems at all determining the locale and we fallback" is made more true, with no other effects.
- On correctness: the premise "since the user's system is set up to return their preference" assumes too much. My locale preference for the overall OS is not necessarily my locale preference for Django. In the case of Mac OS X I don't get to separate the two because of the hardcoding explained above. And if my preference is a locale with a Mac-specific encoding not recognized by Python (such as the Icelandic locale), then that triggers a serious problem in
django.utils.tzinfo
. The problem is not that Django doesn't understand the answer, but that Python itself doesn't. This problem can be worked around with one of the options I propose here.
by , 16 years ago
Attachment: | tzinfo-check-encoding-supported-by-python.diff added |
---|
Still more conservative patch. This one just checks that the encoding is known to Python (and slightly amends a comment)
comment:6 by , 16 years ago
Owner: | changed from | to
---|
by , 16 years ago
Attachment: | 10335-tzname-locale.patch added |
---|
improved fix that moved the encoding constant to the encodings module because it can be used in more than one location, fixed a bug in repr
comment:7 by , 16 years ago
I attached a new patch that is basically the latest one but the constant moved to the encodings module (this is reusable in multiple places). Also if unicode data is returned the repr method will no longer return a unicode string.
comment:8 by , 16 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:9 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Patch for simplest possible change, getdefaultencoding() instead of getdefaultlocale()[1]