Opened 15 years ago

Closed 13 years ago

#10447 closed (fixed)

feedgenerator will raise exception if we set locale.

Reported by: bear330 Owned by: nobody
Component: contrib.syndication Version: 1.0
Severity: Keywords: unicode feed
Cc: Roman Barczyński Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

If I do this before:

import locale
locale.setlocale(locale.LC_ALL, '')

Then the feedgenerator will raise UnicodeDecodeError because this line (207):

handler.addQuickElement(u"lastBuildDate", rfc2822_date(self.latest_post_date()).decode('utf-8'))

This is because rfc2822_date will get
'\xacP\xb4\xc1\xa4@, 09 \xa4T\xa4\xeb 2009 22:21:17 -0000'
in my machine (my locale is 'Chinese_Taiwan.950')

It can't be decode('utf-8')

Thanks.

Attachments (1)

rfc2822_date.patch (1.3 KB ) - added by lupus 14 years ago.
rfc2822_date patch

Download all attachments as: .zip

Change History (8)

comment:1 by bear330, 15 years ago

comment:2 by Malcolm Tredinnick, 15 years ago

Triage Stage: UnreviewedAccepted

"Doctor it hurts when I do this."

"Don't do that, then"

Calling setlocale() is to be generally discouraged, since it has so many unintended side-effects, particularly in multi-threaded applications. So, for now, don't do that.

There are problems in the way of hoping to fix this in some transparent fashion. We can't handle every possible encoding that the system can set, because we only handle what Python does, so some possibilities are just not going to be handled at all. That is not Django's problem. We might be able to inspect the current locale and, if it is something Python can handle, use that as the conversion. Needs some investigation: basically, everywhere we retrieve something from the system has to be identified and checked.

comment:3 by anonymous, 14 years ago

Accordint to RFC 2822, function rfc2822_date in feedgenerator.py must return abbreviated month and day names only in English and must not depend on locale (see http://www.faqs.org/rfcs/rfc2822.html, article 3.3). But it use strftime that depends on locale.

comment:4 by lupus, 14 years ago

Has patch: set

This patch probably fixes this issue.

http://pastey.net/130530

Simply use hardcoded English month and day names instead of %a and %b and it will be fully RFC compliant.

by lupus, 14 years ago

Attachment: rfc2822_date.patch added

rfc2822_date patch

comment:5 by Roman Barczyński, 14 years ago

Cc: Roman Barczyński added

comment:6 by Ramiro Morales, 13 years ago

lupus' patch seems to be the right solution to this issue. Using it we both solve the problem for the OP (although he is using locale.setlocale() which is discouraged because it affects the whole process and isn't thread safe. Maybe he is using the RSS syndication feed functionality in an external script?) and, more importantly, we comply with RFC 2822.

AFAICS unfortunately this can't be fully tested because I don't think calling locale.setlocale() in a test case is a good idea. But we can refer to the relevant section of RFC and to the locale Python module paragraph:

If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as string.lower(), or certain formats used with time.strftime()), you will have to find a way to do it without using the standard library routine.

comment:7 by Ramiro Morales, 13 years ago

Resolution: fixed
Status: newclosed

(In [15112]) Fixed #10447 -- Made sure the syndication feeds helper function that returns RFC 2822-formatted datetime strings isn't affected by the current locale, removing use of strftime() because the '%a' and '%b' format specifiers are problematic in this respect. Thanks bear330 for the report and lupus for an initial patch.

Note: See TracTickets for help on using tickets.
Back to Top