Opened 5 years ago

Closed 3 years ago

#14976 closed New feature (fixed)

Add is_html flag to contrib.messages

Reported by: tedtieken Owned by: nobody
Component: contrib.messages Version: master
Severity: Normal Keywords: safe, messages, html
Cc: florian+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by lukeplant)

I would like to have add a message.is_html flag to the Message model of the contrib.messages app.

The flag would be set to False by default and could be explicitly overridden for messages that are HTML. There are times when it would be helpful to the end user to include an html link in a message ("Welcome, click here to create a profile", "You've sent 25 points to user_b, click here to see your balance," etc.), and with the current message system there is not a good way to do this.

Adding the is_html flag would require a minor set of backward compatible changes:

def success(request, message, extra_tags='', fail_silently=False):
to
def success(request, message, extra_tags='', fail_silently=False, is_html=False):

def add_message(request, level, message, extra_tags='', fail_silently=False):
to 
def add_message(request, level, message, extra_tags='', fail_silently=False, is_html=False):

def __init__(self, level, message, extra_tags=None): 
to
def __init__(self, level, message, extra_tags=None, is_html=False):

#add to __init__
self.is_html = is_html

Then in the template:

{% if message.is_html %}{{ message|safe }}{% else %}{{ message }}{% endif %}.


Alternative ways to do this:

  1. Run all messages through the safe filter

    This would require a code-wide policy of "make sure you escape anything in a message that might have user input" such as if my message is "your post %s is now published" % blog.post or "%s has sent you the message %s" %(user, message.content). I would then have to worry about every variable I use in a message string, if it could contain script, and if it is already escaped (or escape everything again). I would also have to worry if everyone else working on the codebase is doing this correctly.
  1. Use a tag

    I could have a policy of adding "html" to the tags I want to run through the safe filter, but this is also fraught with downsides. Since all tags get output into html, the safe flag would end up output to the end user. The template logic is less clear and error prone.

If this isn't violating a core django design precept, I'll get started on a patch in the next few days.

Attachments (1)

messages.patch (7.3 KB) - added by tedtieken 5 years ago.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by tedtieken

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • would like to add

comment:2 Changed 5 years ago by lukeplant

  • Description modified (diff)
  • Summary changed from Add is_safe flag to contrib.messages to Add is_html flag to contrib.messages
  • Triage Stage changed from Unreviewed to Accepted

It isn't at all obvious what 'is_safe' refers to. I thought you were talking about trusted vs untrusted messages. 'is_html' would be much clearer - so I've changed that. I also fixed up some other things in the description where you seemed to switch from "safe" to "test" - it was a bit confusing.

Other than that, I can see the case for this request. We need to think about XSS, but AFAICS there is no issue. The Cookie backend for Messages is potentially vulnerable, but 1) Cookies are a very poor vector for XSS, and 2) we are signing and checking all Messages using HMAC.

With regards to compatibility, we would also need to ensure that messages pickled before the change can be unpickled after it.

So I've accepted this ticket, assuming we can find a fully backwards compatible solution.

comment:3 Changed 5 years ago by tedtieken

  • Keywords html added

Not sure where "test" came from... probably end of the day brain fog.

Agree with the change to "is_html"

Cookie backend.
The current method stores the message as a 3 or 4 item list in json: [flag, level, message, optionally extra_tags]. The decoding method relies on this information being a list of 3 or 4 items to recreate the message object:

#obj is the json object transformed into a list, obj[0] is a flag set to '__json_message'
return Message(*obj[1:]) 

Adding another optional argument creates a problem. If we have [flag, level, message, extra_tags, is_html] the message decoding works, but if we have [flag, level, message, is_html] then the is_html tag is positionally interpreted as the value of extra_tags.

The solution I see to this is always store extra_tags and optionally store is_html. In the case where there aren't extra tags, store an empty string.

class MessageEncoder(json.JSONEncoder):
    """
    Compactly serializes instances of the ``Message`` class as JSON.
    """
    message_key = '__json_message'
    def default(self, obj):
        if isinstance(obj, Message):
            message = [self.message_key, obj.level, obj.message]
            if obj.extra_tags:
                message.append(obj.extra_tags)
            else:                                    #New
                message.append(str())                #New
            if obj.is_html:                          #New
                message.append(obj.is_html)          #New
            return message
        return super(MessageEncoder, self).default(obj)

In the no extra_tags scenario, this solution has additional storage overhead of 4 characters - ,"", - IMHO that's acceptable.

Legacy cookies will continue to pass the hash check because we haven't changed the hashing algorithm or what was stored.
Since legacy messages are lists of length 3 or 4 and will never have the is_html flag set to true, the Message() call would still behave as expected with legacy cookies - the call would be either Message(level, message) or Message(level, message, extra_tags).

XSS
To use cookie stored messages in an xss attack, the attacker would have to know the site's secret key, because if the hash doesn't match the cookie backend discards the messages.

Putting user input into an un-escaped output always has the possibility to open up an xss hole, but no more here than in any other feature. By having is_html as an optional, False-by-default variable I think we make it pretty hard to accidentally display a message as html.

Going to take a look at the session backend tonight.

comment:4 Changed 5 years ago by tedtieken

Took a look at the session backend last night.

So long as is_html is an optional parameter on init we shouldn't run into any trouble unpickling legacy messages.

However, depending on implementation there is a chance we would end up with a heterogeneous set of message objects: some with .is_html set, some without a .is_html property. In tests last night with pickling, just adding is_html=False in the method declaration was not sufficient to get an is_html property on unpickled legacy messages.

Option 1: use a class variable to ensure there is always a value for is_html

class Message(StrAndUnicode):
    is_html = False                            #NEW, could be deprecated in a future release

    def __init__(self, level, message, extra_tags=None, is_html=False):           #Modified
        self.level = int(level)
        self.message = message
        self.extra_tags = extra_tags
        self.is_html = is_html                 #NEW

Because there is the class level is_html, even though the instance may not have the attribute any attempt to access my_message.is_html will first check the instance then not finding it fall back to the class.

Option 2: Omit the class level variable and live with heterogeneous messages. The primary use case for is_html is {% if message.is_html %} in the template. The template handles a missing attribute as False, so heterogeneity doesn't cause any issues there. I struggle to come up with a scenario where someone would be interacting with the messages in python code, perhaps some kind of check to see if a message is already queued and to add it if not.

Even though I struggle to find a way Option2 would cause trouble, it has the potential to do so, so I'm in favor of option 1.

comment:5 Changed 5 years ago by lukeplant

I would propose that you change the serialisation in the cookie backend to a list that is always 5 items. That way we can easily tell the old from the new, and the new always have the full set of data. We can live with the overhead, and it is more robust going forward. We can encode the boolean is_html as 0 or 1 for compactness.

For the session backend, a better option is to simply fix up the defective instances of Message after unpickling.

Changed 5 years ago by tedtieken

comment:6 Changed 5 years ago by tedtieken

  • Has patch set
  • Needs documentation set
  • Needs tests set

I coded up a patch, and it is passing previous tests on my system.

Need to write new tests:

  • ensure it is accurately recovering legacy messages
  • test that the is_html tag is being set and retrieved as expected

Passing legacy tests says to me that api backward compatibility has been achieved.

Of note: In base test_multiple_posts(): had to reduce the number of test messages per level from 10 to 9 (from 50 to 45 messages total. 10 appeared to be pushing past the cookie size limit.)

Also of note: To get backward compatibility I had to write the api functions as

def debug(request, message, extra_tags='', fail_silently=False, is_html=False):
instead of 
def debug(request, message, extra_tags='', is_html=False, fail_silently=False):

The later would be more intuitive [request], [message_content], [optional_message_content], [optional_message_content], [behavior_flag].

Given that positional arguments are now non-intuitively ordered, the docs should probably encourage or at least show examples using keyword arguments:

messages.debug(request, "my message", extra_tags="markup this_way", is_html=True)

comment:7 Changed 4 years ago by gabrielhurley

  • Component changed from Contrib apps to contrib.messages

comment:8 Changed 4 years ago by anonymous

  • Severity set to Normal
  • Type set to New feature

comment:9 Changed 4 years ago by apollo13

  • Cc florian+django@… added
  • Easy pickings unset
  • Version changed from 1.2 to SVN

The requested feature already works if someone doesn't use the cookie storage but the session storage and uses mark_safe for the messages (since the session storage uses pickle to serialize the data and not json like the cookie storage does). Now the question is whether or not to add this new feature or just to extend the cookie storage to support SafeUnicode/SafeString.

Granted; is_html is probably more explicit, but I just wanted to raise this as an option…

comment:10 Changed 3 years ago by julien

  • UI/UX unset

An alternative naming could be 'allow_tags' to be consistent with list_display methods.

comment:11 Changed 3 years ago by claudep

  • Resolution set to fixed
  • Status changed from new to closed

We added support for safe strings in cookie storage in #19387. I think that this resolves this issue.

Note: See TracTickets for help on using tickets.
Back to Top