Code


Version 3 (modified by Gary Wilson <gary.wilson@…>, 7 years ago) (diff)

Fixed heading levels.

AutoEscaping proposal

See also: Autoescape alternative

XSS vulnerabilities are the most common form of security hole in web applications by an order of magnitude. In Django, they are avoided using the escape template filter - but it is easy to forget to use this, and just one mistake makes an application vulnerable.

It is proposed that Django auto-escapes ALL output from template variable tags, unless explicitly told not to. This is a controversial change - it breaks backwards compatibility (and hence MUST be decided before version 1.0) and appears at odds the implicit-vs-explicit rule from the Zen of Python. Nevertheless, the security benefits are enormous - and many of the cons can be mitigated with careful design.

Here is a proposed design, based on extensive discussion on the mailing lists.

Auto escaping

Consider a variable name passed from the URL, which contains the following string:

/path/?name=<script>alert('XSS');</script>

At the moment, {{ name }} outputs the following:

<script>alert('XSS');</script>

With auto escaping, this will be output as:

&lt;script&gt;alert(&apos;XSS&apos;);&lt;/script&gt;

But what if you want to output the unescaped string? For example, when generating a plain text email. A block level tag is proposed to deal with this scenario.

{% autoescape off %}
{{ body }}
{% endautoescape %}

You will also be able to set a flag on the context, as explained below.

Escaped v.s. non-escaped strings

A major risk with auto escaping is that things will end up being double escaped. What if the user were already using a filter somewhere along the line that causes HTML to be escaped? The solution is to introduce two types of string: escaped and non-escaped.

Consider the following:

class escaped:
    pass

class escapedstr(str, escaped):
    pass

class escapedunicode(unicode, escaped):
    pass

def markescaped(s):
    if isinstance(s, escaped):
        return s
    if isinstance(s, str):
        return escapedstr(s)
    if isinstance(s, unicode):
        return escapedunicode(s)
    raise ValueError, "'s' must be str or unicode"

(This is one of the few examples where multiple inheritance could be useful in Django).

escapedstr and escapedunicode are subclassses of Python's built in str and unicode types that are marked as being escaped. Other than the fact that they pass the isinstance(s, escaped) test, they are indistinguishable from regular strings. They have no special methods of their own.

This allows us to use them to mark strings that have already been escaped. The auto escape mechanism can then use this marker to decide if something should be escaped or not. This has a number of uses. Firstly, filters that convert a value in to HTML (such as urlize and markdown) can flag it as already being escaped (maybe escape is the wrong term - 'safe' might be better) so that the auto escape mechanism knows not to escape the output. Secondly, model fields that are known to contain safe HTML can likewise be marked. Thirdly, the existing 'escape' filter can use this, preserving backwards compatibility for templates written before the introduction of auto escaping.

Implementation

I propose adding a new property to the Context class, called autoescape. This defaults to being set to True, but can be toggled either in view functions or by {% autoescape off %} blocks in templates. The VariableNode render() method then uses this context flag to decide if escaping should be performed or not:

    def render(self, context):
        output = self.filter_expression.resolve(context)
        encoded = self.encode_output(output)
        if context.autoescape and not isinstance(s, escaped):
            return escape(encoded)
        else:
            return encoded

And here's the implementation of the {% autoescape on/off %} template tag:

class AutoEscapeNode(Node):
    def __init__(self, setting, nodelist):
        self.setting, self.nodelist = setting, nodelist

    def render(self, context):
        old_setting = context.autoescape
        context.autoescape = self.setting
        output = self.nodelist.render(context)
        context.autoescape = old_setting
        return output

#@register.tag(name="autoescape")
def do_autoescape(parser, token):
    """
    Set autoescape behaviour for this block. Possible values are 'on' and 'off'.
    """
    _, rest = token.contents.split(None, 1)
    if rest not in ('on', 'off'):
        raise TemplateSyntaxError("autoescape argument must be 'on' or 'off'.")
    setting = (rest == 'on')
    nodelist = parser.parse(('endautoescape',))
    parser.delete_first_token()
    return AutoEscapeNode(setting, nodelist)
do_autoescape = register.tag("autoescape", do_autoescape)

Additional work

A bunch more work needs to be done to implement this change (probably in a branch), including the following:

  • Modify existing Django filters to flag strings as escaped where necessary
  • Modify built-in Django templates (inc. error pages) to take this in to account
  • Lots of tests!
  • Extensive documentation

Prior discussion