Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#14235 closed (fixed)

UnicodeDecodeError in CSRF middleware

Reported by: Jasper Bryant-Greene Owned by: Luke Plant
Component: Forms Version: dev
Severity: Keywords: csrf unicode
Cc: pswartz@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Just updated to SVN head to fix the CSRF security issue and am now getting:

mod_wsgi (pid=15736): Exception occurred processing WSGI script '/home/.../django.wsgi'.
Traceback (most recent call last):
 File "/usr/lib/python2.5/site-packages/django/core/handlers/wsgi.py", line 245, in __call__
  response = middleware_method(request, response)
 File "/usr/lib/python2.5/site-packages/django/middleware/csrf.py", line 256, in process_response
  response.content, n = _POST_FORM_RE.subn(add_csrf_field, response.content)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 953: ordinal not in range(128)

Attachments (3)

14235.diff (3.8 KB ) - added by Luke Plant 14 years ago.
Patch. Fixed typo in comment in last patch.
14235.2.diff (5.7 KB ) - added by Luke Plant 14 years ago.
Updates for Karen's comments.
14235.3.diff (6.5 KB ) - added by Luke Plant 14 years ago.
Added condition for empty cookie

Download all attachments as: .zip

Change History (18)

comment:1 by Jasper Bryant-Greene, 14 years ago

Has patch: set
Keywords: csrf unicode added

I don’t know if this is the correct fix, but it worked around the issue for me:

Index: django/middleware/csrf.py
===================================================================
--- django/middleware/csrf.py	(revision 13700)
+++ django/middleware/csrf.py	(working copy)
@@ -247,10 +247,10 @@
                                            itertools.repeat(''))
             def add_csrf_field(match):
                 """Returns the matched <form> tag plus the added <input> element"""
-                return mark_safe(match.group() + "<div style='display:none;'>" + \
+                return mark_safe((match.group() + "<div style='display:none;'>" + \
                 "<input type='hidden' " + idattributes.next() + \
                 " name='csrfmiddlewaretoken' value='" + escape(csrf_token) + \
-                "' /></div>")
+                "' /></div>").encode("utf-8"))
 
             # Modify any POST forms
             response.content, n = _POST_FORM_RE.subn(add_csrf_field, response.content)

comment:2 by antisvin, 14 years ago

I've also been bitten by this bugs - as well as all non-english language users would be after updating to 1.2.2. A more fine-grained patch that only works around changes from the 1.2.2 security patch:

Index: middleware/csrf.py
===================================================================
--- middleware/csrf.py	(revision 13700)
+++ middleware/csrf.py	(working copy)
@@ -249,7 +249,7 @@
                 """Returns the matched <form> tag plus the added <input> element"""
                 return mark_safe(match.group() + "<div style='display:none;'>" + \
                 "<input type='hidden' " + idattributes.next() + \
-                " name='csrfmiddlewaretoken' value='" + escape(csrf_token) + \
+                " name='csrfmiddlewaretoken' value='" + escape(csrf_token).encode('utf-8') + \
                 "' /></div>")

comment:3 by Luke Plant, 14 years ago

Owner: changed from nobody to Luke Plant
Status: newassigned

OK, we've been bitten again by Python's unicode handling:

For the typical case, csrf_token is entirely ASCII, but escape(csrf_token) returns a unicode object (that contains entirely ASCII chars, but it doesn't matter). The result is that the entire expression is silently converted to a unicode object. HttpResponse.content is a property that returns an encoded str, not a unicode object, so the regex substitution method fails as above.

As for the fix, we probably need to be aware of the encoding of the response, which might not be UTF-8. We could use response._charset to find out, but people might not be using Django's HttpResponse object. An alternative is to use encode('ascii', 'ignore'), which should work fine with all the CSRF tokens that Django produces.

Note, I think this is only a problem for people using CsrfResponseMiddleware, which is deprecated. But a fix is still needed urgently.

comment:4 by Karen Tracey, 14 years ago

Triage Stage: UnreviewedAccepted

Perhaps try using response._charset for the encoding and fall back to ascii, ignore only if it does not exist?

comment:5 by Luke Plant, 14 years ago

I thin kit is more robust and more easily tested if we deal with both the XSS issue addressed in [36898] and the unicode issue here if we just sanitise the input cookie to be alphanumeric. This also means that that outgoing cookie and token will match. Patch attached shortly.

by Luke Plant, 14 years ago

Attachment: 14235.diff added

Patch. Fixed typo in comment in last patch.

comment:6 by Luke Plant, 14 years ago

I thought of an advantage of doing it the way my last patch does it: if there is some client side code that is building a form dynamically, and includes a CSRF hidden token by using the CSRF cookie value, they might still be vulnerable to the XSS attack addressed in [13698]. This way, the cookie value is sanitised and they would not be vulnerable. (This is probably an unlikely attack, though).

comment:7 by Paul Swartz (PCF), 14 years ago

Cc: pswartz@… added

comment:8 by Karen Tracey, 14 years ago

I assume the escape still in the csrftoken tag implementation (http://code.djangoproject.com/browser/django/trunk/django/template/defaulttags.py#L46) is still needed?

There's now an unneeded import of escape in the csrf tests.py file.

For consistency with American spellings uses elsewhere throughout the code it should spelled _sanitize_token....

in reply to:  8 comment:9 by Luke Plant, 14 years ago

Replying to kmtracey:

I assume the escape still in the csrftoken tag implementation (http://code.djangoproject.com/browser/django/trunk/django/template/defaulttags.py#L46) is still needed?

Actually, no, although it doesn't harm having it there. For consistency I guess it should be removed.

There's now an unneeded import of escape in the csrf tests.py file.

For consistency with American spellings uses elsewhere throughout the code it should spelled _sanitize_token....

Good catches.

by Luke Plant, 14 years ago

Attachment: 14235.2.diff added

Updates for Karen's comments.

comment:10 by Malcolm Tredinnick, 14 years ago

I can't see any immediate problem here, but we were discussing this over dinner (Karen, myself, Russ, ...) and the potential problem corner that occurred to me is this: if somebody manages to get a corrupted/changed cookie to the point that this sanitization removes *all* characters, will that cause some kind of unpleasant error? I suspect not, but I'm raising it for thought for a few hours (late at night here).

comment:11 by Luke Plant, 14 years ago

For an empty token, the result would currently be that the {% csrf_token %} tag would output nothing, and subsequent form submission would fail.

The scenarios I can think of here, are 1) that a subdomain could have set the CSRF cookie, presumably for malicious purposes, or 2) that some one has just messed around with their own cookie. In the first case, we have a known attack that we can't prevent, but that attack might be dealt with by the owner of the subdomain being sorted out. So in either case it would be nicer if the user wasn't stuck with a dodgy cookie that made them unable to submit forms. So, I'll add a condition to cater for that.

by Luke Plant, 14 years ago

Attachment: 14235.3.diff added

Added condition for empty cookie

comment:12 by Malcolm Tredinnick, 14 years ago

Looks good to me. James is around at the sprints (in Portland) today, as well as myself and Karen, so we can roll out a fix release today (including the packaging problem fix as well).

comment:13 by Malcolm Tredinnick, 14 years ago

Component: UncategorizedForms

comment:14 by Luke Plant, 14 years ago

Resolution: fixed
Status: assignedclosed

(In [13732]) Fixed #14235 - UnicodeDecodeError in CSRF middleware

Thanks to jbg for the report.

This changeset essentially backs out [13698] in favour of a method that
sanitizes the token rather than escaping it.

comment:15 by Luke Plant, 14 years ago

(In [13733]) [1.2.X] Fixed #14235 - UnicodeDecodeError in CSRF middleware

Thanks to jbg for the report.

This changeset essentially backs out [13698] in favour of a method that
sanitizes the token rather than escaping it.

Backport of [13732] from trunk.

Note: See TracTickets for help on using tickets.
Back to Top