== AutoEscaping proposal ==
[http://en.wikipedia.org/wiki/Cross-site_scripting XSS vulnerabilities] are the most common form of security hole in web applications by an order of magnitude. In Django, they are avoided using the `escape` template filter - but it is easy to forget to use this, and just one mistake makes an application vulnerable.
It is proposed that Django auto-escapes ALL output from template variable tags, unless explicitly told not to. This is a controversial change - it breaks backwards compatibility (and hence MUST be decided before version 1.0) and appears at odds the implicit-vs-explicit rule from the Zen of Python. Nevertheless, the security benefits are enormous - and many of the cons can be mitigated with careful design.
Here is a proposed design, based on extensive discussion on the mailing lists.
=== Auto escaping ===
Consider a variable `name` passed from the URL, which contains the following string:
{{{
/path/?name=
}}}
At the moment, `{{ name }}` outputs the following:
{{{
}}}
With auto escaping, this will be output as:
{{{
<script>alert('XSS');</script>
}}}
But what if you ''want'' to output the unescaped string? For example, when generating a plain text email. A block level tag is proposed to deal with this scenario.
{{{
{% autoescape off %}
{{ body }}
{% endautoescape %}
}}}
You will also be able to set a flag on the context, as explained below.
=== Escaped v.s. non-escaped strings ===
A major risk with auto escaping is that things will end up being double escaped. What if the user were already using a filter somewhere along the line that causes HTML to be escaped? The solution is to introduce two types of string: escaped and non-escaped.
Consider the following:
{{{
class escaped:
pass
class escapedstr(str, escaped):
pass
class escapedunicode(unicode, escaped):
pass
def markescaped(s):
if isinstance(s, escaped):
return s
if isinstance(s, str):
return escapedstr(s)
if isinstance(s, unicode):
return escapedunicode(s)
raise ValueError, "'s' must be str or unicode"
}}}
(This is one of the few examples where multiple inheritance could be useful in Django).
`escapedstr` and `escapedunicode` are subclassses of Python's built in `str` and `unicode` types that are marked as being escaped. Other than the fact that they pass the `isinstance(s, escaped)` test, they are indistinguishable from regular strings. They have no special methods of their own.
This allows us to use them to mark strings that have already been escaped. The auto escape mechanism can then use this marker to decide if something should be escaped or not. This has a number of uses. Firstly, filters that convert a value in to HTML (such as `urlize` and `markdown`) can flag it as already being escaped (maybe escape is the wrong term - 'safe' might be better) so that the auto escape mechanism knows not to escape the output. Secondly, model fields that are known to contain safe HTML can likewise be marked. Thirdly, the existing 'escape' filter can use this, preserving backwards compatibility for templates written before the introduction of auto escaping.
=== Implementation ===
I propose adding a new property to the `Context` class, called `autoescape`. This defaults to being set to `True`, but can be toggled either in view functions or by `{% autoescape off %}` blocks in templates. The `VariableNode` render() method then uses this context flag to decide if escaping should be performed or not:
{{{
def render(self, context):
output = self.filter_expression.resolve(context)
encoded = self.encode_output(output)
if context.autoescape and not isinstance(s, escaped):
return escape(encoded)
else:
return encoded
}}}
And here's the implementation of the `{% autoescape on/off %}` template tag:
{{{
class AutoEscapeNode(Node):
def __init__(self, setting, nodelist):
self.setting, self.nodelist = setting, nodelist
def render(self, context):
old_setting = context.autoescape
context.autoescape = self.setting
output = self.nodelist.render(context)
context.autoescape = old_setting
return output
#@register.tag(name="autoescape")
def do_autoescape(parser, token):
"""
Set autoescape behaviour for this block. Possible values are 'on' and 'off'.
"""
_, rest = token.contents.split(None, 1)
if rest not in ('on', 'off'):
raise TemplateSyntaxError("autoescape argument must be 'on' or 'off'.")
setting = (rest == 'on')
nodelist = parser.parse(('endautoescape',))
parser.delete_first_token()
return AutoEscapeNode(setting, nodelist)
do_autoescape = register.tag("autoescape", do_autoescape)
}}}
=== Additional work ===
A bunch more work needs to be done to implement this change (probably in a branch), including the following:
* Modify existing Django filters to flag strings as escaped where necessary
* Modify built-in Django templates (inc. error pages) to take this in to account
* Lots of tests!
* Extensive documentation
=== Prior discussion ===
* [http://groups.google.com/group/django-developers/browse_thread/thread/17d1dfecd67864ab/2d177ac262232b73 Proposal: default escaping] on django-developers
* [http://groups.google.com/group/django-developers/browse_thread/thread/e448bbdd40426915/70c34ce7cc96e283 templates and html escaping] on django-developers
* [http://groups.google.com/group/django-developers/browse_thread/thread/e448bbdd40426915/70c34ce7cc96e283 Global Escape] on django-users