Code

Opened 5 years ago

Closed 4 years ago

#10447 closed (fixed)

feedgenerator will raise exception if we set locale.

Reported by: bear330 Owned by: nobody
Component: contrib.syndication Version: 1.0
Severity: Keywords: unicode feed
Cc: romke Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Hi,

If I do this before:

import locale
locale.setlocale(locale.LC_ALL, '')

Then the feedgenerator will raise UnicodeDecodeError because this line (207):

handler.addQuickElement(u"lastBuildDate", rfc2822_date(self.latest_post_date()).decode('utf-8'))

This is because rfc2822_date will get
'\xacP\xb4\xc1\xa4@, 09 \xa4T\xa4\xeb 2009 22:21:17 -0000'
in my machine (my locale is 'Chinese_Taiwan.950')

It can't be decode('utf-8')

Thanks.

Attachments (1)

rfc2822_date.patch (1.3 KB) - added by lupus 5 years ago.
rfc2822_date patch

Download all attachments as: .zip

Change History (8)

comment:1 Changed 5 years ago by bear330

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 5 years ago by mtredinnick

  • Triage Stage changed from Unreviewed to Accepted

"Doctor it hurts when I do this."

"Don't do that, then"

Calling setlocale() is to be generally discouraged, since it has so many unintended side-effects, particularly in multi-threaded applications. So, for now, don't do that.

There are problems in the way of hoping to fix this in some transparent fashion. We can't handle every possible encoding that the system can set, because we only handle what Python does, so some possibilities are just not going to be handled at all. That is not Django's problem. We might be able to inspect the current locale and, if it is something Python can handle, use that as the conversion. Needs some investigation: basically, everywhere we retrieve something from the system has to be identified and checked.

comment:3 Changed 5 years ago by anonymous

Accordint to RFC 2822, function rfc2822_date in feedgenerator.py must return abbreviated month and day names only in English and must not depend on locale (see http://www.faqs.org/rfcs/rfc2822.html, article 3.3). But it use strftime that depends on locale.

comment:4 Changed 5 years ago by lupus

  • Has patch set

This patch probably fixes this issue.

http://pastey.net/130530

Simply use hardcoded English month and day names instead of %a and %b and it will be fully RFC compliant.

Changed 5 years ago by lupus

rfc2822_date patch

comment:5 Changed 4 years ago by romke

  • Cc romke added

comment:6 Changed 4 years ago by ramiro

lupus' patch seems to be the right solution to this issue. Using it we both solve the problem for the OP (although he is using locale.setlocale() which is discouraged because it affects the whole process and isn't thread safe. Maybe he is using the RSS syndication feed functionality in an external script?) and, more importantly, we comply with RFC 2822.

AFAICS unfortunately this can't be fully tested because I don't think calling locale.setlocale() in a test case is a good idea. But we can refer to the relevant section of RFC and to the locale Python module paragraph:

If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as string.lower(), or certain formats used with time.strftime()), you will have to find a way to do it without using the standard library routine.

comment:7 Changed 4 years ago by ramiro

  • Resolution set to fixed
  • Status changed from new to closed

(In [15112]) Fixed #10447 -- Made sure the syndication feeds helper function that returns RFC 2822-formatted datetime strings isn't affected by the current locale, removing use of strftime() because the '%a' and '%b' format specifiers are problematic in this respect. Thanks bear330 for the report and lupus for an initial patch.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.