Opened 6 years ago

Closed 5 years ago

Last modified 4 years ago

#15237 closed Bug (fixed)

Django generated Atom/RSS feeds don't specify charset=utf8 in their Content-Type

Reported by: simon Owned by: Jason Kotenko
Component: contrib.syndication Version: 1.3
Severity: Normal Keywords:
Cc: Jason Kotenko, shadow, techtonik@… Triage Stage: Ready for checkin
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8". At the moment Django's default behaviour is to serve them without the charset bit, and it's not particularly easy to over-ride this behaviour:

http://code.djangoproject.com/browser/django/trunk/django/utils/feedgenerator.py#L290

The workaround I'm using at the moment is to wrap the feed in a view function which over-rides the content-type on the generated response object, but it's a bit of a hack:

def feed(request):
    response = MyFeed()(request)
    response['Content-Type'] = 'application/atom+xml; charset=utf-8'
    return response

Attachments (3)

django_15237.diff (1.1 KB) - added by Jason Kotenko 6 years ago.
SVN diff
15237-reopened.patch (1.5 KB) - added by Aymeric Augustin 5 years ago.
15237-rss.patch (1.3 KB) - added by michal@… 5 years ago.
Added charset to RSS feeds

Download all attachments as: .zip

Change History (22)

comment:1 Changed 6 years ago by anonymous

milestone: 1.3
Triage Stage: UnreviewedAccepted

This seems like a reasonable request. I'm not an expert on the feeds framework, but it doesn't look like it ever produces things which are NOT UTF-8, so hopefully the fix is trivial.

comment:2 Changed 6 years ago by Bas Peschier

Since the syndication framework actually writes everything in utf-8 in the view (see http://code.djangoproject.com/browser/django/trunk/django/contrib/syndication/views.py#L40) this should be a case of just adding "; charset=utf8" to the line simon is referring to?

comment:3 Changed 6 years ago by Jason Kotenko

Owner: changed from nobody to Jason Kotenko
Status: newassigned

comment:4 Changed 6 years ago by Jason Kotenko

Cc: Jason Kotenko added
Has patch: set

Assertion "Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8"." verified here: http://tools.ietf.org/html/rfc2045#section-5.1 (mime-type syntax) and here: http://tools.ietf.org/html/rfc3023#section-3.2 (recommendation to always set the charset).

Although it does appear that if the charset is set in the XML declaration, i.e. <?xml version="1.0" encoding="utf-8"?> , then the charset in the Content-Type is not required, since everything within that XML document is supposed to be treated as UTF8. However, it is still recommended so will proceed.

Looks like the code in /django/contrib/syndication/views.py does not set the mime-type, it only uses it. It is the util code in /django/utils/feedgenerator.py that sets it, as mentioned above.

Can't find anything in the docs that requires changing due to this small change.

Added regression test to verify the MIME type is still set with encoding in the future.

Changed 6 years ago by Jason Kotenko

Attachment: django_15237.diff added

SVN diff

comment:5 Changed 6 years ago by Jannis Leidel

Resolution: fixed
Status: assignedclosed

In [15505]:

(The changeset message doesn't reference this ticket)

comment:6 Changed 6 years ago by Jannis Leidel

In [15512]:

[1.2.X] Fixed #15237 -- Always set charset of Atom1 feeds to UTF-8. Thanks, Simon and jasonkotenko.

Backport from trunk (r15505).

comment:7 Changed 6 years ago by Caio Ariede

Another workaround for this issue, is extending feed class:

from django.contrib.syndication.views import Feed
from django.utils import feedgenerator

class FeedUTF8(feedgenerator.DefaultFeed):
    def __init__(self, *args, **kwargs):
        super(FeedUTF8, self).__init__(*args, **kwargs)
        self.mime_type = '%s; charset=utf-8' % self.mime_type

And then specify the feed_type:

class LatestEntriesFeed(Feed):
    ...
    feed_type = FeedUTF8

...

$ curl -I http://localhost:8000/feed
HTTP/1.0 200 OK
Date: Thu, 31 Mar 2011 16:40:26 GMT
Server: WSGIServer/0.1 Python/2.6.1
Content-Type: application/rss+xml; charset=utf-8

comment:8 Changed 6 years ago by philip@…

Resolution: fixed
Severity: Normal
Status: closedreopened
Type: Uncategorized

The charset should be “utf-8” rather than “utf8”, since the latter isn't what's registered with IANA. See: http://www.w3.org/International/O-HTTP-charset.

comment:9 Changed 6 years ago by Julien Phalip

Type: UncategorizedBug

comment:10 Changed 5 years ago by Aymeric Augustin

Easy pickings: set
UI/UX: unset

Bug confirmed: http://tools.ietf.org/html/rfc3023#section-3.2 (link given in a previous comment) says 'utf-8' and not 'utf8'.

While investigating this problem, I noticed that the codebase consistently uses <unicode>.encode('utf-8'), except one instance in tests/regressiontests/signing/tests.py, where the dash is missing. The codecs module defines utf8 as an alias of utf-8, so the code works, but there's no reason to keep this exception. I included that fix in the patch too — feel free to commit it separately or not commit it at all.

PS : you could have opened a new ticket instead of reopening this one, because strictly speaking, it's a different issue.

Changed 5 years ago by Aymeric Augustin

Attachment: 15237-reopened.patch added

comment:11 Changed 5 years ago by Ivan Sagalaev

Triage Stage: AcceptedReady for checkin

Acting like another set of eyes. Seems pretty straightforward — RFC.

comment:12 Changed 5 years ago by Jannis Leidel

Resolution: fixed
Status: reopenedclosed

In [16738]:

Fixed #15237 -- Fixed a typo in specifying UTF-8 encoding in the feed generator and signing tests. Thanks, Aymeric Augustin.

comment:13 Changed 5 years ago by Jacob

milestone: 1.3

Milestone 1.3 deleted

comment:14 Changed 5 years ago by shadow

Cc: shadow added
Has patch: unset
Resolution: fixed
Status: closedreopened
Triage Stage: Ready for checkinUnreviewed
Version: 1.2SVN

This fix only seems to have been applied to Atom feeds, and not RSS feeds.

Is there a reason for this? If not, could it please also be applied to RSS feeds?

One use case is: debugging feeds with Google Chrome, which displays them in text/plain, and therefore doesn't parse the document level encoding attribute (<?xml version="1.0" encoding="utf-8"?>). The result is it uses an incorrect encoding (e.g. country’s, instead of country's).

comment:15 Changed 5 years ago by Łukasz Rekucki

Triage Stage: UnreviewedAccepted

Given the previous argument that Feed always writes the content in UTF-*, it sound reasonable to me. And as the original ticket mentions both Atom and RSS, I think it's ok to reopen this ticket.

Changed 5 years ago by michal@…

Attachment: 15237-rss.patch added

Added charset to RSS feeds

comment:16 Changed 5 years ago by Grzegorz Nosek

Triage Stage: AcceptedReady for checkin

comment:17 Changed 5 years ago by Paul McMillan

Resolution: fixed
Status: reopenedclosed

In [17494]:

(The changeset message doesn't reference this ticket)

comment:18 Changed 4 years ago by anatoly techtonik

Cc: techtonik@… added
Version: master1.3

Any chance for it to be backported to 1.3?

comment:19 in reply to:  18 Changed 4 years ago by Claude Paroz

Replying to techtonik:

Any chance for it to be backported to 1.3?

Not at all, sorry. Only security-related issues might have a chance to be backported to 1.3.

Note: See TracTickets for help on using tickets.
Back to Top