Opened 9 years ago

Closed 8 years ago

Last modified 8 years ago

#3664 closed (fixed)

UnicodeDecodeError in contrib/syndication/feeds.py

Reported by: Ville Säävuori <Ville@…> Owned by: jacob
Component: Documentation Version: master
Severity: Keywords: unicode
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: UI/UX:

Description

I'm using contrib.syndication for making feeds for Flickr photos and Ma.gnolia links that both have tags which have funky characters (tags like 'pärnu' and 'työ'). Django dies with UnicodeDecodeError when trying to make a feed that has url with funky characters.

The error message is:

UnicodeDecodeError at /syndicate/tag/pärnu/
'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

...

Exception Location:  	/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/contrib/syndication/feeds.py in add_domain, line 9

add_domain function is very simple, and the problem seems to be with line that is:

url = u'http://%s%s' % (domain, url)

I tested this and found that when decoding the url with latin1 (iso-8859-1) like:

url = u'http://%s%s' % (domain, url.decode('latin1'))

but I'm not very confident of this being a good fix for this.

Attachments (1)

fix.diff (654 bytes) - added by Gary Wilson <gary.wilson@…> 8 years ago.
wording fix

Download all attachments as: .zip

Change History (13)

comment:1 Changed 9 years ago by Simon G. <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

This looks to be another unicode issue that we're going to look into after 0.96 is released.

comment:2 Changed 9 years ago by Ville Säävuori <Ville@…>

I wrote a workaround for myself for this. Details are at http://www.unessa.net/en/hoyci/2007/03/unicode-and-django-rss-framework/

It would have been better to write a good patch to resolve the problem and not it's causes, but I'm still not really sure how this should be fixed "right".

comment:3 Changed 8 years ago by mtredinnick

  • Component changed from RSS framework to Documentation
  • Owner changed from adrian to jacob

This is a documentation bug, rather than a code bug.

Anything you pass up as a link, including things returned from item_link() in syndication classes and get_absolute_urls() on models, must already be in the character set specified in RFC 1738 (the URL spec). So you must already have done the necessary conversion from non-ASCII characters to ASCII and called urllib.quote() if necessary. In the above example, you are passing non-ASCII characters to something expecting content for a URL, so it is failing.

We cannot perform the conversion to utf-8 and/or url quoting, because, for example, the standard IRI -> URI conversion process is that you convert first and then quote(), so we don't want to accidently do it twice (and there are lots of other places where get_absolute_url() needs to already be returning the correctly quoted string).

I will update the documentation.

comment:4 Changed 8 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [5250]) Fixed #3664 -- Documented that get_absolute_url() and item_link() (in
syndication) links are expected to be strings that can be used in URLs without
further quoting or encoding.

Changed 8 years ago by Gary Wilson <gary.wilson@…>

wording fix

comment:5 Changed 8 years ago by Gary Wilson <gary.wilson@…>

  • Has patch set
  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Accepted to Ready for checkin

comment:6 Changed 8 years ago by Gary Wilson <gary.wilson@…>

  • Has patch unset
  • Resolution set to fixed
  • Status changed from reopened to closed
  • Triage Stage changed from Ready for checkin to Accepted

oops, wrong ticket number mentioned in [5250]

comment:7 Changed 8 years ago by Julian

I can't see how this is fixed now. Still makes errors for me, I have quoted everything correctly but feeds.py still seems to get in trouble because of the request URL containing urlencoded unicode.

Why is it even

url = u'http://%s%s' % (domain, url)

and not

url = 'http://%s%s' % (domain, url)

if the urls shouldnt be unicode??

comment:8 Changed 8 years ago by mtredinnick

It sounds like you haven't fully URL and IRI encoded your "url" fragment. Please ask support questions on the mailing list (django-users), though, rather than in Trac.

comment:9 Changed 8 years ago by anonymous

  • Needs tests set
  • Patch needs improvement set

I still have this error, I think the ticket should be reopened.
From what I can tell the error has nothing to do with fully encoding your url fragments and so on. The problem seems to be that the feed object gets a somehow not URL-quoted feed_url where it says

    def __init__(self, slug, feed_url):

when I do a print feed_url it does not show me a URL which is "ASCII and URL-quoted". So the part after

# 'url' must already be ASCII and URL-quoted, so no need for encoding

throws an error. Maybe no one ever discovered the bug because you don't have to do with foreign-language sites!?

comment:10 Changed 8 years ago by ubernostrum

Please read the Unicode URI/IRI documentation carefully; if you have Unicode inside URLs, you are responsible for ensuring that you call the proper function to escape it before handing it off to anything else. If you have further questions, please follow Malcolm's suggestion and ask them on the django-users mailing list.

comment:11 Changed 8 years ago by anonymous

That would mean I can't use the feeds as described in the docs!?
The request URL has encoded and quoted Unicode, so what can I do when it is passed wrong to the feed object which throws an error?
All my other URLs are completely correct.

comment:12 Changed 8 years ago by mtredinnick

We have asked a number of times in the comments to please ask questions on the django-users list. You can post an example of how your code is generating the URL and what the problem is. The lack of examples you have provided makes it impossible to debug anything and Trac is not a good place to have support and debugging conversations. Certainly the earlier examples in this ticket were cases of bad user code, rather than a bug in Django, and yours may well be similar.

Post to django-users. Give an example of what the URL string is and how you are generating it. Then you will get help with fixing it.

Note: See TracTickets for help on using tickets.
Back to Top