Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#15936 closed New feature (invalid)

Syndication: Turning off autoescape (content:encoded)

Reported by: Brant Steen <brant.steen@…> Owned by: nobody
Component: contrib.syndication Version: 1.3
Severity: Normal Keywords: syndication, content:encoded
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX:

Description

A very common element to pull into an RSS feed is the content:encoded element. In a blog, for example, it allows you to put a full entry into your RSS feed, including the HTML in that entry (headings, paragraphs, lists, whatnot).

I couldn't find a way to get this element to work, given the current contrib.syndication module. I would do this:

class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed):
    """
    Create a type of RSS feed that has content:encoded elements.
    """
    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content:encoded', item['content_encoded'])

...

class TheFeed(Feed):
    feed_type = ExtendedRSSFeed

    ....

    def item_extra_kwargs(self, item):
        return {'content_encoded': self.item_content_encoded(item)}
    
    def item_content_encoded(self, item):
        return "<![CDATA[%s]]>" % item.content    

But that would generate a feed with all of the HTML bits autoescaped... even if I put an {% autoescape off %} block in the template where the content:encoded was being pulled from. So, instead of being able to stick html tags inside the CDATA, I would just end up with a lot of &lt;h1&gt; stuff.

After drilling in and finding the SimplerXMLGenerator, it seemed like the ability to turn off autoescaping could be done at this point, without breaking anyone's current implementations (see patch). Thus, the only change to the above example becomes this:

class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed):
    """
    Create a type of RSS feed that has content:encoded elements.
    """
    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content:encoded', item['content_encoded'], escape=False)

And then content:encoded can be handled through the normal syndication process.

Attachments (4)

patch.diff (814 bytes) - added by Brant Steen <brant.steen@…> 3 years ago.
Patch for SimplerXMLGenerator
Screen shot 2011-05-01 at 09.37.36.png (81.3 KB) - added by aaugustin 3 years ago.
Safari-RSS.jpg (162.8 KB) - added by brant 3 years ago.
RSS feed in Safari
FireFox-RSS.jpg (140.0 KB) - added by brant 3 years ago.
RSS in Firefox

Download all attachments as: .zip

Change History (10)

Changed 3 years ago by Brant Steen <brant.steen@…>

Patch for SimplerXMLGenerator

comment:1 Changed 3 years ago by Brant Steen <brant.steen@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I should mention, in item_content_encoded, item.content has a bunch of HTML in it...

comment:2 Changed 3 years ago by aaugustin

  • Needs documentation set
  • Needs tests set
  • Resolution set to invalid
  • Status changed from new to closed

Based on http://web.resource.org/rss/1.0/modules/content/ content:encoded is:

An element whose contents are the entity-encoded or CDATA-escaped version of the content of the item.

I see that you are trying to force CDATA-escaping:

    def item_content_encoded(self, item):
        return "<![CDATA[%s]]>" % item.content  

Why don't you just let Django perform the equivalent entity-encoding?

As far as I can tell, the following solution works perfectly (see screenshot):

from django.contrib.syndication.views import feedgenerator, Feed

### not touched from your example

class ExtendedRSSFeed(feedgenerator.Rss201rev2Feed):
    """
    Create a type of RSS feed that has content:encoded elements.
    """
    def root_attributes(self):
        attrs = super(ExtendedRSSFeed, self).root_attributes()
        attrs['xmlns:content'] = 'http://purl.org/rss/1.0/modules/content/'
        return attrs
    
    def add_item_elements(self, handler, item):
        super(ExtendedRSSFeed, self).add_item_elements(handler, item)
        handler.addQuickElement(u'content:encoded', item['content_encoded'])

### customized

class TestFeed(Feed):
    title = "test"
    link = "/"
    description = "test"
    feed_type = ExtendedRSSFeed

    def items(self):
        return range(3)

    def item_title(self, item):
        return "Title of item %d" % item

    def item_description(self, item):
        return "Description of item %d" % item

    def item_link(self, item):
        return "/%d/" % item

    def item_extra_kwargs(self, item):
        return {'content_encoded': '<h1>Item %d</h1><p>lorem ipsum...</p>' % item}

Changed 3 years ago by aaugustin

comment:3 Changed 3 years ago by brant

I believe that what is making your example work correctly is actually safari's parsing. Check it in FireFox (you'll need to view the source, firefox doesn't show the content:encoded portion on the screen).

I modified my feed to reflect the example above. I'm attaching 2 screenshots of the source (one from safari, one from firefox). You'll see that it's autoescaped in firefox. Also, if I check the same feed in Chrome, (which doesn't have a native RSS parser so it just shows you source) I see the same result as the firefox screenshot.

Changed 3 years ago by brant

RSS feed in Safari

Changed 3 years ago by brant

RSS in Firefox

comment:4 Changed 3 years ago by anonymous

  • Resolution invalid deleted
  • Status changed from closed to reopened

comment:5 Changed 3 years ago by aaugustin

  • Resolution set to invalid
  • Status changed from reopened to closed

As far as I can tell, everything is working properly.

In HTML, "<p> 1 < 2 </p>" is invalid, the correct version is "<p> 1 &lt; 2 </p>". In Atom, it is the same, "<content:encoded> foo<br />bar </content:encoded>" is invalid, the correct version "<content:encoded> foo&lt;br /&gt;bar </content:encoded>".

Your screenshot in Firefox shows correct escaping in the raw, unparsed source of your feed. Your screenshot in Safari shows that Safari has parsed the source, properly extracted the contents of the <content:encoded> tag, and has inserted it in an HTML structure for display.

The name of the tag "content:encoded" itself makes it fairly explicit that its content be encoded, and so does the spec (see my first comment). If you could insert arbitrary unescaped HTML inside "content:encoded", your RSS feed would no longer be valid XML, something RSS parsers clearly do not handle!

comment:6 Changed 3 years ago by Brant Steen <brant.steen@…>

Interesting... hence why it's always needed to be wrapped in CDATA.

Thanks for clearing that up... I guess it can work either way but leaving it the way it is keeps it more generalized.

Whenever I've looked at RSS feeds, they always do the CDATA inside content:encoded elements... so I figured they were always just a hand-in-hand kind of thing. Which is why I was so surprised when I couldn't figure out how to turn off auto escaping using the feedgenerator.

Thanks again, that makes a lot of sense now.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.