Django

Code

Ticket #3664 (closed: fixed)

Opened 2 years ago

Last modified 2 years ago

UnicodeDecodeError in contrib/syndication/feeds.py

Reported by: Ville Säävuori <Ville@Unessa.net> Assigned to: jacob
Milestone: Component: Documentation
Version: SVN Keywords: unicode
Cc: Triage Stage: Accepted
Has patch: 0 Needs documentation: 0
Needs tests: 1 Patch needs improvement: 1

Description

I'm using contrib.syndication for making feeds for Flickr photos and Ma.gnolia links that both have tags which have funky characters (tags like 'pärnu' and 'työ'). Django dies with UnicodeDecodeError? when trying to make a feed that has url with funky characters.

The error message is:

UnicodeDecodeError at /syndicate/tag/pärnu/
'ascii' codec can't decode byte 0xc3 in position 24: ordinal not in range(128)

...

Exception Location:  	/usr/lib/python2.4/site-packages/Django-0.95-py2.4.egg/django/contrib/syndication/feeds.py in add_domain, line 9

add_domain function is very simple, and the problem seems to be with line that is:

url = u'http://%s%s' % (domain, url)

I tested this and found that when decoding the url with latin1 (iso-8859-1) like:

url = u'http://%s%s' % (domain, url.decode('latin1'))

but I'm not very confident of this being a good fix for this.

Attachments

fix.diff (0.6 kB) - added by Gary Wilson <gary.wilson@gmail.com> on 05/18/07 16:10:30.
wording fix

Change History

03/06/07 07:00:25 changed by Simon G. <dev@simon.net.nz>

  • needs_better_patch changed.
  • stage changed from Unreviewed to Accepted.
  • needs_tests changed.
  • needs_docs changed.

This looks to be another unicode issue that we're going to look into after 0.96 is released.

03/06/07 12:32:04 changed by Ville Säävuori <Ville@Unessa.net>

I wrote a workaround for myself for this. Details are at http://www.unessa.net/en/hoyci/2007/03/unicode-and-django-rss-framework/

It would have been better to write a good patch to resolve the problem and not it's causes, but I'm still not really sure how this should be fixed "right".

05/15/07 12:22:37 changed by mtredinnick

  • owner changed from adrian to jacob.
  • component changed from RSS framework to Documentation.

This is a documentation bug, rather than a code bug.

Anything you pass up as a link, including things returned from item_link() in syndication classes and get_absolute_urls() on models, must already be in the character set specified in RFC 1738 (the URL spec). So you must already have done the necessary conversion from non-ASCII characters to ASCII and called urllib.quote() if necessary. In the above example, you are passing non-ASCII characters to something expecting content for a URL, so it is failing.

We cannot perform the conversion to utf-8 and/or url quoting, because, for example, the standard IRI -> URI conversion process is that you convert first and then quote(), so we don't want to accidently do it twice (and there are lots of other places where get_absolute_url() needs to already be returning the correctly quoted string).

I will update the documentation.

05/15/07 13:03:01 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [5250]) Fixed #3664 -- Documented that get_absolute_url() and item_link() (in syndication) links are expected to be strings that can be used in URLs without further quoting or encoding.

05/18/07 16:10:30 changed by Gary Wilson <gary.wilson@gmail.com>

  • attachment fix.diff added.

wording fix

05/18/07 16:10:56 changed by Gary Wilson <gary.wilson@gmail.com>

  • status changed from closed to reopened.
  • has_patch set to 1.
  • resolution deleted.
  • stage changed from Accepted to Ready for checkin.

05/18/07 16:12:18 changed by Gary Wilson <gary.wilson@gmail.com>

  • status changed from reopened to closed.
  • has_patch deleted.
  • resolution set to fixed.
  • stage changed from Ready for checkin to Accepted.

oops, wrong ticket number mentioned in [5250]

05/19/07 07:31:23 changed by Julian

I can't see how this is fixed now. Still makes errors for me, I have quoted everything correctly but feeds.py still seems to get in trouble because of the request URL containing urlencoded unicode.

Why is it even

url = u'http://%s%s' % (domain, url)

and not

url = 'http://%s%s' % (domain, url)

if the urls shouldnt be unicode??

05/19/07 11:33:26 changed by mtredinnick

It sounds like you haven't fully URL and IRI encoded your "url" fragment. Please ask support questions on the mailing list (django-users), though, rather than in Trac.

07/04/07 16:02:52 changed by anonymous

  • needs_better_patch set to 1.
  • needs_tests set to 1.

I still have this error, I think the ticket should be reopened. From what I can tell the error has nothing to do with fully encoding your url fragments and so on. The problem seems to be that the feed object gets a somehow not URL-quoted feed_url where it says

    def __init__(self, slug, feed_url):

when I do a print feed_url it does not show me a URL which is "ASCII and URL-quoted". So the part after

# 'url' must already be ASCII and URL-quoted, so no need for encoding

throws an error. Maybe no one ever discovered the bug because you don't have to do with foreign-language sites!?

07/04/07 18:53:50 changed by ubernostrum

Please read the Unicode URI/IRI documentation carefully; if you have Unicode inside URLs, you are responsible for ensuring that you call the proper function to escape it before handing it off to anything else. If you have further questions, please follow Malcolm's suggestion and ask them on the django-users mailing list.

07/05/07 07:23:11 changed by anonymous

That would mean I can't use the feeds as described in the docs!? The request URL has encoded and quoted Unicode, so what can I do when it is passed wrong to the feed object which throws an error? All my other URLs are completely correct.

07/06/07 00:48:47 changed by mtredinnick

We have asked a number of times in the comments to please ask questions on the django-users list. You can post an example of how your code is generating the URL and what the problem is. The lack of examples you have provided makes it impossible to debug anything and Trac is not a good place to have support and debugging conversations. Certainly the earlier examples in this ticket were cases of bad user code, rather than a bug in Django, and yours may well be similar.

Post to django-users. Give an example of what the URL string is and how you are generating it. Then you will get help with fixing it.


Add/Change #3664 (UnicodeDecodeError in contrib/syndication/feeds.py)




Change Properties
Action