Code

Opened 3 years ago

Closed 2 years ago

Last modified 22 months ago

#15237 closed Bug (fixed)

Django generated Atom/RSS feeds don't specify charset=utf8 in their Content-Type

Reported by: simon Owned by: jasonkotenko
Component: contrib.syndication Version: 1.3
Severity: Normal Keywords:
Cc: jasonkotenko, shadow, techtonik@… Triage Stage: Ready for checkin
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8". At the moment Django's default behaviour is to serve them without the charset bit, and it's not particularly easy to over-ride this behaviour:

http://code.djangoproject.com/browser/django/trunk/django/utils/feedgenerator.py#L290

The workaround I'm using at the moment is to wrap the feed in a view function which over-rides the content-type on the generated response object, but it's a bit of a hack:

def feed(request):
    response = MyFeed()(request)
    response['Content-Type'] = 'application/atom+xml; charset=utf-8'
    return response

Attachments (3)

django_15237.diff (1.1 KB) - added by jasonkotenko 3 years ago.
SVN diff
15237-reopened.patch (1.5 KB) - added by aaugustin 3 years ago.
15237-rss.patch (1.3 KB) - added by michal@… 2 years ago.
Added charset to RSS feeds

Download all attachments as: .zip

Change History (22)

comment:1 Changed 3 years ago by anonymous

  • milestone set to 1.3
  • Triage Stage changed from Unreviewed to Accepted

This seems like a reasonable request. I'm not an expert on the feeds framework, but it doesn't look like it ever produces things which are NOT UTF-8, so hopefully the fix is trivial.

comment:2 Changed 3 years ago by bpeschier

Since the syndication framework actually writes everything in utf-8 in the view (see http://code.djangoproject.com/browser/django/trunk/django/contrib/syndication/views.py#L40) this should be a case of just adding "; charset=utf8" to the line simon is referring to?

comment:3 Changed 3 years ago by jasonkotenko

  • Owner changed from nobody to jasonkotenko
  • Status changed from new to assigned

comment:4 Changed 3 years ago by jasonkotenko

  • Cc jasonkotenko added
  • Has patch set

Assertion "Atom feeds containing UTF8 characters should be served with a Content-Type of "application/atom+xml; charset=utf8"." verified here: http://tools.ietf.org/html/rfc2045#section-5.1 (mime-type syntax) and here: http://tools.ietf.org/html/rfc3023#section-3.2 (recommendation to always set the charset).

Although it does appear that if the charset is set in the XML declaration, i.e. <?xml version="1.0" encoding="utf-8"?> , then the charset in the Content-Type is not required, since everything within that XML document is supposed to be treated as UTF8. However, it is still recommended so will proceed.

Looks like the code in /django/contrib/syndication/views.py does not set the mime-type, it only uses it. It is the util code in /django/utils/feedgenerator.py that sets it, as mentioned above.

Can't find anything in the docs that requires changing due to this small change.

Added regression test to verify the MIME type is still set with encoding in the future.

Changed 3 years ago by jasonkotenko

SVN diff

comment:5 Changed 3 years ago by jezdez

  • Resolution set to fixed
  • Status changed from assigned to closed

In [15505]:

(The changeset message doesn't reference this ticket)

comment:6 Changed 3 years ago by jezdez

In [15512]:

[1.2.X] Fixed #15237 -- Always set charset of Atom1 feeds to UTF-8. Thanks, Simon and jasonkotenko.

Backport from trunk (r15505).

comment:7 Changed 3 years ago by caioariede

Another workaround for this issue, is extending feed class:

from django.contrib.syndication.views import Feed
from django.utils import feedgenerator

class FeedUTF8(feedgenerator.DefaultFeed):
    def __init__(self, *args, **kwargs):
        super(FeedUTF8, self).__init__(*args, **kwargs)
        self.mime_type = '%s; charset=utf-8' % self.mime_type

And then specify the feed_type:

class LatestEntriesFeed(Feed):
    ...
    feed_type = FeedUTF8

...

$ curl -I http://localhost:8000/feed
HTTP/1.0 200 OK
Date: Thu, 31 Mar 2011 16:40:26 GMT
Server: WSGIServer/0.1 Python/2.6.1
Content-Type: application/rss+xml; charset=utf-8

comment:8 Changed 3 years ago by philip@…

  • Resolution fixed deleted
  • Severity set to Normal
  • Status changed from closed to reopened
  • Type set to Uncategorized

The charset should be “utf-8” rather than “utf8”, since the latter isn't what's registered with IANA. See: http://www.w3.org/International/O-HTTP-charset.

comment:9 Changed 3 years ago by julien

  • Type changed from Uncategorized to Bug

comment:10 Changed 3 years ago by aaugustin

  • Easy pickings set
  • UI/UX unset

Bug confirmed: http://tools.ietf.org/html/rfc3023#section-3.2 (link given in a previous comment) says 'utf-8' and not 'utf8'.

While investigating this problem, I noticed that the codebase consistently uses <unicode>.encode('utf-8'), except one instance in tests/regressiontests/signing/tests.py, where the dash is missing. The codecs module defines utf8 as an alias of utf-8, so the code works, but there's no reason to keep this exception. I included that fix in the patch too — feel free to commit it separately or not commit it at all.

PS : you could have opened a new ticket instead of reopening this one, because strictly speaking, it's a different issue.

Changed 3 years ago by aaugustin

comment:11 Changed 3 years ago by isagalaev

  • Triage Stage changed from Accepted to Ready for checkin

Acting like another set of eyes. Seems pretty straightforward — RFC.

comment:12 Changed 3 years ago by jezdez

  • Resolution set to fixed
  • Status changed from reopened to closed

In [16738]:

Fixed #15237 -- Fixed a typo in specifying UTF-8 encoding in the feed generator and signing tests. Thanks, Aymeric Augustin.

comment:13 Changed 3 years ago by jacob

  • milestone 1.3 deleted

Milestone 1.3 deleted

comment:14 Changed 2 years ago by shadow

  • Cc shadow added
  • Has patch unset
  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Ready for checkin to Unreviewed
  • Version changed from 1.2 to SVN

This fix only seems to have been applied to Atom feeds, and not RSS feeds.

Is there a reason for this? If not, could it please also be applied to RSS feeds?

One use case is: debugging feeds with Google Chrome, which displays them in text/plain, and therefore doesn't parse the document level encoding attribute (<?xml version="1.0" encoding="utf-8"?>). The result is it uses an incorrect encoding (e.g. country’s, instead of country's).

comment:15 Changed 2 years ago by lrekucki

  • Triage Stage changed from Unreviewed to Accepted

Given the previous argument that Feed always writes the content in UTF-*, it sound reasonable to me. And as the original ticket mentions both Atom and RSS, I think it's ok to reopen this ticket.

Changed 2 years ago by michal@…

Added charset to RSS feeds

comment:16 Changed 2 years ago by gnosek

  • Triage Stage changed from Accepted to Ready for checkin

comment:17 Changed 2 years ago by PaulM

  • Resolution set to fixed
  • Status changed from reopened to closed

In [17494]:

(The changeset message doesn't reference this ticket)

comment:18 follow-up: Changed 22 months ago by techtonik

  • Cc techtonik@… added
  • Version changed from master to 1.3

Any chance for it to be backported to 1.3?

comment:19 in reply to: ↑ 18 Changed 22 months ago by claudep

Replying to techtonik:

Any chance for it to be backported to 1.3?

Not at all, sorry. Only security-related issues might have a chance to be backported to 1.3.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.