Opened 16 years ago
Closed 14 years ago
#10447 closed (fixed)
feedgenerator will raise exception if we set locale.
Reported by: | bear330 | Owned by: | nobody |
---|---|---|---|
Component: | contrib.syndication | Version: | 1.0 |
Severity: | Keywords: | unicode feed | |
Cc: | Roman Barczyński | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
If I do this before:
import locale locale.setlocale(locale.LC_ALL, '')
Then the feedgenerator will raise UnicodeDecodeError because this line (207):
handler.addQuickElement(u"lastBuildDate", rfc2822_date(self.latest_post_date()).decode('utf-8'))
This is because rfc2822_date will get
'\xacP\xb4\xc1\xa4@, 09 \xa4T\xa4\xeb 2009 22:21:17 -0000'
in my machine (my locale is 'Chinese_Taiwan.950')
It can't be decode('utf-8')
Thanks.
Attachments (1)
Change History (8)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 15 years ago
Accordint to RFC 2822, function rfc2822_date in feedgenerator.py must return abbreviated month and day names only in English and must not depend on locale (see http://www.faqs.org/rfcs/rfc2822.html, article 3.3). But it use strftime that depends on locale.
comment:4 by , 15 years ago
Has patch: | set |
---|
This patch probably fixes this issue.
Simply use hardcoded English month and day names instead of %a and %b and it will be fully RFC compliant.
comment:5 by , 15 years ago
Cc: | added |
---|
comment:6 by , 14 years ago
lupus' patch seems to be the right solution to this issue. Using it we both solve the problem for the OP (although he is using locale.setlocale() which is discouraged because it affects the whole process and isn't thread safe. Maybe he is using the RSS syndication feed functionality in an external script?) and, more importantly, we comply with RFC 2822.
AFAICS unfortunately this can't be fully tested because I don't think calling locale.setlocale()
in a test case is a good idea. But we can refer to the relevant section of RFC and to the locale Python module paragraph:
If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as string.lower(), or certain formats used with time.strftime()), you will have to find a way to do it without using the standard library routine.
comment:7 by , 14 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
(In [15112]) Fixed #10447 -- Made sure the syndication feeds helper function that returns RFC 2822-formatted datetime strings isn't affected by the current locale, removing use of strftime() because the '%a' and '%b' format specifiers are problematic in this respect. Thanks bear330 for the report and lupus for an initial patch.
"Doctor it hurts when I do this."
"Don't do that, then"
Calling
setlocale()
is to be generally discouraged, since it has so many unintended side-effects, particularly in multi-threaded applications. So, for now, don't do that.There are problems in the way of hoping to fix this in some transparent fashion. We can't handle every possible encoding that the system can set, because we only handle what Python does, so some possibilities are just not going to be handled at all. That is not Django's problem. We might be able to inspect the current locale and, if it is something Python can handle, use that as the conversion. Needs some investigation: basically, everywhere we retrieve something from the system has to be identified and checked.