Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#15237 closed Bug (fixed)

Django generated Atom/RSS feeds don't specify charset=utf8 in their Content-Type

Reported by: simon Owned by: Jason Kotenko
Component: contrib.syndication Version: 1.3
Severity: Normal Keywords:
Cc: Jason Kotenko, shadow, techtonik@… Triage Stage: Ready for checkin
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8". At the moment Django's default behaviour is to serve them without the charset bit, and it's not particularly easy to over-ride this behaviour:

http://code.djangoproject.com/browser/django/trunk/django/utils/feedgenerator.py#L290

The workaround I'm using at the moment is to wrap the feed in a view function which over-rides the content-type on the generated response object, but it's a bit of a hack:

def feed(request):
    response = MyFeed()(request)
    response['Content-Type'] = 'application/atom+xml; charset=utf-8'
    return response

Attachments (3)

django_15237.diff (1.1 KB ) - added by Jason Kotenko 13 years ago.
SVN diff
15237-reopened.patch (1.5 KB ) - added by Aymeric Augustin 13 years ago.
15237-rss.patch (1.3 KB ) - added by michal@… 12 years ago.
Added charset to RSS feeds

Download all attachments as: .zip

Change History (22)

comment:1 by anonymous, 13 years ago

milestone: 1.3
Triage Stage: UnreviewedAccepted

This seems like a reasonable request. I'm not an expert on the feeds framework, but it doesn't look like it ever produces things which are NOT UTF-8, so hopefully the fix is trivial.

comment:2 by Bas Peschier, 13 years ago

Since the syndication framework actually writes everything in utf-8 in the view (see http://code.djangoproject.com/browser/django/trunk/django/contrib/syndication/views.py#L40) this should be a case of just adding "; charset=utf8" to the line simon is referring to?

comment:3 by Jason Kotenko, 13 years ago

Owner: changed from nobody to Jason Kotenko
Status: newassigned

comment:4 by Jason Kotenko, 13 years ago

Cc: Jason Kotenko added
Has patch: set

Assertion "Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8"." verified here: http://tools.ietf.org/html/rfc2045#section-5.1 (mime-type syntax) and here: http://tools.ietf.org/html/rfc3023#section-3.2 (recommendation to always set the charset).

Although it does appear that if the charset is set in the XML declaration, i.e. <?xml version="1.0" encoding="utf-8"?> , then the charset in the Content-Type is not required, since everything within that XML document is supposed to be treated as UTF8. However, it is still recommended so will proceed.

Looks like the code in /django/contrib/syndication/views.py does not set the mime-type, it only uses it. It is the util code in /django/utils/feedgenerator.py that sets it, as mentioned above.

Can't find anything in the docs that requires changing due to this small change.

Added regression test to verify the MIME type is still set with encoding in the future.

by Jason Kotenko, 13 years ago

Attachment: django_15237.diff added

SVN diff

comment:5 by Jannis Leidel, 13 years ago

Resolution: fixed
Status: assignedclosed

In [15505]:

Fixed #15237 -- Always set charset of Atom1 feeds to UTF-8. Thanks, Simon and jasonkotenko.

comment:6 by Jannis Leidel, 13 years ago

In [15512]:

[1.2.X] Fixed #15237 -- Always set charset of Atom1 feeds to UTF-8. Thanks, Simon and jasonkotenko.

Backport from trunk (r15505).

comment:7 by Caio Ariede, 13 years ago

Another workaround for this issue, is extending feed class:

from django.contrib.syndication.views import Feed
from django.utils import feedgenerator

class FeedUTF8(feedgenerator.DefaultFeed):
    def __init__(self, *args, **kwargs):
        super(FeedUTF8, self).__init__(*args, **kwargs)
        self.mime_type = '%s; charset=utf-8' % self.mime_type

And then specify the feed_type:

class LatestEntriesFeed(Feed):
    ...
    feed_type = FeedUTF8

...

$ curl -I http://localhost:8000/feed
HTTP/1.0 200 OK
Date: Thu, 31 Mar 2011 16:40:26 GMT
Server: WSGIServer/0.1 Python/2.6.1
Content-Type: application/rss+xml; charset=utf-8

comment:8 by philip@…, 13 years ago

Resolution: fixed
Severity: Normal
Status: closedreopened
Type: Uncategorized

The charset should be “utf-8” rather than “utf8”, since the latter isn't what's registered with IANA. See: http://www.w3.org/International/O-HTTP-charset.

comment:9 by Julien Phalip, 13 years ago

Type: UncategorizedBug

comment:10 by Aymeric Augustin, 13 years ago

Easy pickings: set
UI/UX: unset

Bug confirmed: http://tools.ietf.org/html/rfc3023#section-3.2 (link given in a previous comment) says 'utf-8' and not 'utf8'.

While investigating this problem, I noticed that the codebase consistently uses <unicode>.encode('utf-8'), except one instance in tests/regressiontests/signing/tests.py, where the dash is missing. The codecs module defines utf8 as an alias of utf-8, so the code works, but there's no reason to keep this exception. I included that fix in the patch too — feel free to commit it separately or not commit it at all.

PS : you could have opened a new ticket instead of reopening this one, because strictly speaking, it's a different issue.

by Aymeric Augustin, 13 years ago

Attachment: 15237-reopened.patch added

comment:11 by Ivan Sagalaev, 13 years ago

Triage Stage: AcceptedReady for checkin

Acting like another set of eyes. Seems pretty straightforward — RFC.

comment:12 by Jannis Leidel, 13 years ago

Resolution: fixed
Status: reopenedclosed

In [16738]:

Fixed #15237 -- Fixed a typo in specifying UTF-8 encoding in the feed generator and signing tests. Thanks, Aymeric Augustin.

comment:13 by Jacob, 12 years ago

milestone: 1.3

Milestone 1.3 deleted

comment:14 by shadow, 12 years ago

Cc: shadow added
Has patch: unset
Resolution: fixed
Status: closedreopened
Triage Stage: Ready for checkinUnreviewed
Version: 1.2SVN

This fix only seems to have been applied to Atom feeds, and not RSS feeds.

Is there a reason for this? If not, could it please also be applied to RSS feeds?

One use case is: debugging feeds with Google Chrome, which displays them in text/plain, and therefore doesn't parse the document level encoding attribute (<?xml version="1.0" encoding="utf-8"?>). The result is it uses an incorrect encoding (e.g. country’s, instead of country's).

comment:15 by Łukasz Rekucki, 12 years ago

Triage Stage: UnreviewedAccepted

Given the previous argument that Feed always writes the content in UTF-*, it sound reasonable to me. And as the original ticket mentions both Atom and RSS, I think it's ok to reopen this ticket.

by michal@…, 12 years ago

Attachment: 15237-rss.patch added

Added charset to RSS feeds

comment:16 by Grzegorz Nosek, 12 years ago

Triage Stage: AcceptedReady for checkin

comment:17 by Paul McMillan, 12 years ago

Resolution: fixed
Status: reopenedclosed

In [17494]:

(The changeset message doesn't reference this ticket)

comment:18 by anatoly techtonik, 12 years ago

Cc: techtonik@… added
Version: master1.3

Any chance for it to be backported to 1.3?

in reply to:  18 comment:19 by Claude Paroz, 12 years ago

Replying to techtonik:

Any chance for it to be backported to 1.3?

Not at all, sorry. Only security-related issues might have a chance to be backported to 1.3.

Note: See TracTickets for help on using tickets.
Back to Top