[[TOC]]

= CSRF Protection =

This page aims to document and discuss CSRF protection for Django.

== Summary ==

For Django 1.2, Luke Plant, with feedback from other developers, proposes:

 * We should move to using a session independent nonce as a CSRF token, instead of a hash of the session identifier as used in Django 1.1 and earlier.  This eliminates the false positives associated with session cycling, and removes the dependency on the session framework, making the middleware more generally useful, and also fixing login CSRF vulnerabilities (which were only partially and accidentally addressed before).
 * Since the above introduces a subtle regression involving attacks on HTTPS connections, this should be fixed by adding strict referer checking for HTTPS connections.
 * The post-processing middleware (!CsrfResponseMiddleware) should be deprecated and replaced with a template tag to insert the CSRF token.  This allows for streaming responses and fixes a potential vulnerability that existed with the post-processing middleware.  All contrib apps should be updated to use this, which makes the CSRF app a dependency of these apps.
 * In response to feedback from other devs, additionally:
   * A decorator ``csrf_protect`` should be implemented and added to all contrib views that need it, so that even in the absence of the middleware, these apps are protected.
   * Everything needed for this should be baked into core template tags etc., so that no changes to settings are needed
   * ``django.contrib.csrf.*`` should move to core code somewhere.
 
No immediate changes are required for people upgrading if they want the admin and other contrib apps to continue to work.  Longer term (by Django 1.4), apps must be updated to switch away from the deprecated features and use the new template tag method.  Also, imports must be changed by Django 1.4.

The patch for this is complete (see bottom).

Also, Simon Willison is proposing:
 * a 'csrf_protect_form`` decorator that acts similarly to ``csrf_protect``, but moves the actual rejection of requests to part of form validation, by adding a special entry to ``request.POST`` which is checked in ``Form.is_valid()``.
 * extensions to the template tag to allow the CSRF token to be specific to the form for extra security.

Simon's patch is currently in progress.

== Background ==

For general discussion of the CSRF issue see:
 
 * [http://en.wikipedia.org/wiki/CSRF Wikipedia on CSRF]
 * [http://www.adambarth.com/papers/2008/barth-jackson-mitchell-b.pdf Robust Defences for Cross-Site Forgery (Barth, Jackson, Mitchell)] - very important paper, terminology used here reflects the terminology of that paper.

== Discussions ==

For the record, the most relevant discussion on django-developers are here, most recent first, but this page attempts to supersede them all by gathering the conclusions.

 * Discussion of this proposal -  http://groups.google.com/group/django-developers/browse_thread/thread/3d2dc750082103dc/f3beb18c27fb7152?lnk=gst
 * Discussion of #9977 patch csrf_template_tag_14.diff - http://groups.google.com/group/django-developers/browse_thread/thread/c23f556b88cedbc7?pli=1
 * Luke Plant's proposal - http://groups.google.com/group/django-developers/browse_thread/thread/ae525f270ed46933/c338b050f43741b2?lnk=gst
 * Simon Willison's proposal - SafeForm - http://groups.google.com/group/django-developers/browse_thread/thread/2c33621003992d07/61a9fefb50d662b0

== Django tickets ==

There is a lot of discussion in these tickets as well, which are also superseded by this page.

 * #9977 (most up to date)
 * #10816 (remove session dependency from Django 1.0 middleware)
 * #510 (This is not really fixed until the CSRF protection is enabled by default)

== Types of attack ==

The following different attacks are in scope:

 1. 'CSRF': normal CSRF attacks, where an attacking site hosts a form, link, or piece of javascript that causes the user's browser to make a request on a (Django) site.  Normally this involves abuse of the user agent's authentication (e.g. a session/login cookie) to cause an action that the attacking site would not otherwise have permission to do.
 2. 'Login CSRF': this is when a browser is tricked into logging into a site under an account that is controlled by an attacker, so that the attacker can then track or otherwise abuse the actions performed by the victim.
 3. 'CSRF + MITM': this is a combination of CSRF and active network (man-in-the-middle) attack, which is relevant in HTTPS situations which are otherwise thought to be invulnerable to MITM attacks.  This is because HTTP 'Set-Cookie' headers are accepted by clients that are talking to a site under HTTPS, allowing a MITM attacker to set cookies on the client. (MITM attacks under HTTP are considered out of scope, because in general MITM attacks under HTTP are impossible to protect against).

Further, attacks can be categorised as:

 1. Cross site  - attacker.com attacking victim.com
 2. Cross sub-domain - attacker.example.com attacking victim.example.com

There is also an attack related to 'CSRF + MITM' which is not necessarily 'CSRF', which is called 'session fixation': the opposite of session theft, a malicious agent sets the session cookie on a user's machine, giving them the attacker's session.  The attacker may or may not be 'authenticated' in that session - if not, the victim might authenticate, and the attacker may be able to take control of the victim's account or abuse the victim's authentication.  Alternatively, if the attacker is already authenticated, there are the same problems as those present in 'login CSRF'.  This type of attack can be achieved by an active network attacker (not in scope for HTTP connections), but it can also be achieved by someone with control of a sub-domain through the use of the wild-card cookies.  

(Question: if the sub-domain is unable to send cookies, is this attack foiled - or do you need to stop javascript as well?  The answer depends on browser policies about cookie setting, what happens when a page does {{{document.domain = 'example.com';}}} ?)

== Methods of defence ==

The main contenders are:

 * 'HMAC of session identifier'.  This was provided by Django 1.0 in !CsrfMiddleware and in 1.1 in !CsrfViewMiddleware, and is referred to as the 'CSRF token'.  All incoming POST requests that have an active session are required to have a CSRF token that is a hash of the session identifier and the site's SECRET_KEY.  (Technically Django 1.1 used straight MD5, not HMAC-MD5).
 
 * 'session independent nonce'.  A random value is stored in a cookie, unique to every user, and POST forms must contain the same value as a token.

 * 'Strict Referer checking'. The HTTP 'REFERER' header must be present and must match the site's domain for the request to be accepted (for POST requests).

== Methods of token insertion ==

 * Django 1.0 provided !CsrfMiddleware and 1.1 provided !CsrfResponseMiddleware which did automatic insertion of the token in outgoing pages, to all POST forms.  This has several problems:
   * The performance hit from doing the post processing.
   * It removes the ability to do streaming of responses (should that become a possibility in future Django versions)
   * It adds the CSRF token to all POST forms, including those targeted at external sites.  These sites would then gain access to the CSRF token and would be able to do CSRF attacks on that user. (This can be avoided by use of the `@csrf_response_exempt` decorator if the page has no internal forms, but that might be an unacceptable constraint, and the default behaviour opens up vulnerabilities easily).  To put it simply, control over token insertion is on a page by page basis, when it needs to be form by form.
   * Modifying ``!HttpResponse.content`` can have nasty side effects and interactions with other middleware e.g. see #9163.

 * !SafeForm - a Django Form subclass that adds the token.  Proposal abandoned in favour of template tag for various reasons, especially:
   * requires a different API to the normal Django Form which is confusing and brings many complications.
   * much more invasive changes requires to switch to it, making it a big barrier for users.

 * Template tag - templates that need the CSRF token need to insert them manually using `{% csrf_token %}` in every `<form>` that is internal (and using `{% load csrf %}` at the top if the template tag is not a builtin).  This also requires use of a context processor (usually via !RequestContext).  This method is assumed for the rest of the document.  

== Evaluation ==

Each of the methods of defence will be evaluated against the possible attacks.

=== Django 1.0 - HMAC of session identifier ===

 1. CSRF: 
   * if using Django's session framework as the basis for authorisation: '''protected'''
   * otherwise: '''not protected''' (the middleware provides no protection if there isn't an active session)
 2. Login CSRF:
   * if using Django's built-in session framework and login routines, you are '''protected''' (albeit accidentally).  This is because the login views in Django create a session when the login form is first accessed.  When the form is filled in and POSTed back, the view checks for the existence of a test value in the session (to check whether the session cookie has been accepted by the browser).  If not found, you receive the message: "Looks like your browser isn't configured to accept cookies. Please enable cookies, reload this page, and try again".  Hence, login CSRF doesn't work - the session is established in the step before logging in, and is required for login to succeed.
   * other login methods are '''not protected'''.
 3. CSRF + MITM, under HTTPS: '''possibly protected'''
   * The CSRF token is tied to the session.  If the session ID has been revealed over HTTP (SESSION_COOKIE_SECURE = False, the default), then the MITM will be able to retrieve the CSRF token corresponding to it fairly easily.  Otherwise, he will not be able to do so, and so will not be able to abuse the user's session. (However, if the session ID has been revealed over HTTP, more direct session theft attacks are probably going to be more of an issue).
   * If the site is not using Django's session framework, there is no protection against this attack.
   * Related session fixing vulnerabilities are not protected, essentially due to a flaw in cookies (see Barth et al).
 4. Cross sub-domain CSRF: same as normal CSRF (1) above
 5. Cross sub-domain login CSRF: same as normal login CSRF (2) above
 6. Cross sub-domain session fixing: '''not protected''' 
   * sub-domains will be able to send a wild-card session cookie to clients, giving them the attacker's session.

Additional issues:

 * it can cause false positives when the session identifier changes.  This can happen if the user has more than one tab open, and in one tab does something that causes session identifier to change, and then submits a form that is open in another tab.

 * it is tied to the session framework, which means it's not re-usable for sites that don't use Django sessions, or have alternative login strategies.

=== Session independent nonce ===

 1. CSRF: '''protected'''
 2. Login CSRF: '''protected''' (as long as both the CSRF token and the cookie are required for all POST requests)
 3. CSRF + MITM, under HTTPS: '''not protected'''
   * The attacker can set the CSRF cookie using Set-Cookie, and then supply a matching token in the POST form data.  Since the site does not tie the session cookies to the CSRF cookies, it has no way of determining that the CSRF token + cookie are genuine (doing hashing etc. of one of them will not work, as the attacker can just get a valid pair from the site directly, and use that pair in the attack).
 4. Cross sub-domain CSRF: '''not protected'''
   * The sub-domain can simply send a wild-card cookie to set the CSRF cookie, and include the corresponding token in the form.
 5. Cross sub-domain login CSRF: '''not protected'''
   * Same reason as (4)
 6. Cross sub-domain session fixing: '''not protected''' 
   * sub-domains will be able to send a wild-card session cookie to clients, giving them the attacker's session.

=== Strict Referer checking ===

 1. CSRF: '''protected'''
 2. Login CSRF: '''protected'''
 3. CSRF + MITM, under HTTPS: '''protected'''
   * but related session fixing vulnerabilities are not protected, essentially due to a flaw in cookies (see Barth et al).
 4. Cross sub-domain CSRF: '''protected'''
 5. Cross sub-domain login CSRF: '''protected'''
 6. Cross sub-domain session fixing: '''not protected''' (out of scope)

The big problem with strict Referer checking is that the Referer header is suppressed by some browsers and by some networks.  However, Barth et al have shown that for same-domain HTTPS requests, this is as little as 0.05% - 0.22%, and recommend that this method can be used for HTTPS connections.  Since HTTPS connections cannot be tampered with (apart from in some rare internal-network-with-proxy situations), suppression of the Referer header can only be done by the browser, so if a user is having problems due to use of this method, they can simply be instructed to configure their browser differently or use a different browser.

== Proposal ==

(initially by Luke Plant)

CSRF protection should be done by the following method:

 * Session independent nonce 
   * with backwards compatibility for the Django 1.0 token to avoid upgrade bumps
 * Additionally, strict Referer header checking for HTTPS only
 * Template tag for inserting the CRSF token 
   * with a backwards compatible !CsrfResponseMiddleware which can be used at the same time as the template tag, to allow people to upgrade without upgrading all their apps.
 * The middleware should be enabled by default.
 * A decorator derived from the middleware is added to all contrib views that need it, so there is protection even if the middleware is turned off.

Compared to Django 1.1:
 * there is no dependence on the session framework, expanding the usefulness of this protection
 * the false positives caused by session cycling are eliminated
 * the vulnerability caused by automatically including the CSRF token in all POST forms (including external targets) can be avoided much more easily, and you are much more secure by default.
 * some cross sub-domain vulnerabilities are opened up - '''regression'''.  This is deemed an acceptable compromise, since there were already cross sub-domain vulnerabilities, and giving sub-domains to untrusted parties seems to be increasingly uncommon as most people buy their own domains.  It also isn't possible to fix cross sub-domain session fixing without changes to browsers, so it is safer simply to say that sub-domains should only be given to trusted parties.
 * for HTTPS connections, adding strict Referer checking closes the other vulnerabilities opened up by the change from 'HMAC of session identifier' to 'session independent nonce' (i.e.  CSRF + MITM under HTTPS)

This proposal is implemented in the lp-csrf_rework branch in http://bitbucket.org/spookylukey/django-trunk-lukeplant/ (with patches regularly copied to #9977).  It includes fixes to all the contrib apps and documentation.

The docs for this branch, which contain upgrade information, are here: http://bitbucket.org/spookylukey/django-trunk-lukeplant/src/tip/docs/ref/contrib/csrf.txt

== Further work ==

 * Could examine use of Origin header for CSRF protection, in addition to this.  
   * It's usefulness will really depend on whether browsers implement it, and how quickly - we won't be able to rely on it for a long time.
   * Note that if we simply compare Origin and Host, we are still vulnerable to DNS rebinding attacks. 
 * General protection against DNS rebinding attacks.  This would require a setting that listed allowable values of Host.  This could be a separate middleware.