Code

Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#14235 closed (fixed)

UnicodeDecodeError in CSRF middleware

Reported by: jbg Owned by: lukeplant
Component: Forms Version: master
Severity: Keywords: csrf unicode
Cc: pswartz@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Just updated to SVN head to fix the CSRF security issue and am now getting:

mod_wsgi (pid=15736): Exception occurred processing WSGI script '/home/.../django.wsgi'.
Traceback (most recent call last):
 File "/usr/lib/python2.5/site-packages/django/core/handlers/wsgi.py", line 245, in __call__
  response = middleware_method(request, response)
 File "/usr/lib/python2.5/site-packages/django/middleware/csrf.py", line 256, in process_response
  response.content, n = _POST_FORM_RE.subn(add_csrf_field, response.content)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 953: ordinal not in range(128)

Attachments (3)

14235.diff (3.8 KB) - added by lukeplant 4 years ago.
Patch. Fixed typo in comment in last patch.
14235.2.diff (5.7 KB) - added by lukeplant 4 years ago.
Updates for Karen's comments.
14235.3.diff (6.5 KB) - added by lukeplant 4 years ago.
Added condition for empty cookie

Download all attachments as: .zip

Change History (18)

comment:1 Changed 4 years ago by jbg

  • Has patch set
  • Keywords csrf unicode added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I don’t know if this is the correct fix, but it worked around the issue for me:

Index: django/middleware/csrf.py
===================================================================
--- django/middleware/csrf.py	(revision 13700)
+++ django/middleware/csrf.py	(working copy)
@@ -247,10 +247,10 @@
                                            itertools.repeat(''))
             def add_csrf_field(match):
                 """Returns the matched <form> tag plus the added <input> element"""
-                return mark_safe(match.group() + "<div style='display:none;'>" + \
+                return mark_safe((match.group() + "<div style='display:none;'>" + \
                 "<input type='hidden' " + idattributes.next() + \
                 " name='csrfmiddlewaretoken' value='" + escape(csrf_token) + \
-                "' /></div>")
+                "' /></div>").encode("utf-8"))
 
             # Modify any POST forms
             response.content, n = _POST_FORM_RE.subn(add_csrf_field, response.content)

comment:2 Changed 4 years ago by antisvin

I've also been bitten by this bugs - as well as all non-english language users would be after updating to 1.2.2. A more fine-grained patch that only works around changes from the 1.2.2 security patch:

Index: middleware/csrf.py
===================================================================
--- middleware/csrf.py	(revision 13700)
+++ middleware/csrf.py	(working copy)
@@ -249,7 +249,7 @@
                 """Returns the matched <form> tag plus the added <input> element"""
                 return mark_safe(match.group() + "<div style='display:none;'>" + \
                 "<input type='hidden' " + idattributes.next() + \
-                " name='csrfmiddlewaretoken' value='" + escape(csrf_token) + \
+                " name='csrfmiddlewaretoken' value='" + escape(csrf_token).encode('utf-8') + \
                 "' /></div>")

comment:3 Changed 4 years ago by lukeplant

  • Owner changed from nobody to lukeplant
  • Status changed from new to assigned

OK, we've been bitten again by Python's unicode handling:

For the typical case, csrf_token is entirely ASCII, but escape(csrf_token) returns a unicode object (that contains entirely ASCII chars, but it doesn't matter). The result is that the entire expression is silently converted to a unicode object. HttpResponse.content is a property that returns an encoded str, not a unicode object, so the regex substitution method fails as above.

As for the fix, we probably need to be aware of the encoding of the response, which might not be UTF-8. We could use response._charset to find out, but people might not be using Django's HttpResponse object. An alternative is to use encode('ascii', 'ignore'), which should work fine with all the CSRF tokens that Django produces.

Note, I think this is only a problem for people using CsrfResponseMiddleware, which is deprecated. But a fix is still needed urgently.

comment:4 Changed 4 years ago by kmtracey

  • Triage Stage changed from Unreviewed to Accepted

Perhaps try using response._charset for the encoding and fall back to ascii, ignore only if it does not exist?

comment:5 Changed 4 years ago by lukeplant

I thin kit is more robust and more easily tested if we deal with both the XSS issue addressed in [36898] and the unicode issue here if we just sanitise the input cookie to be alphanumeric. This also means that that outgoing cookie and token will match. Patch attached shortly.

Changed 4 years ago by lukeplant

Patch. Fixed typo in comment in last patch.

comment:6 Changed 4 years ago by lukeplant

I thought of an advantage of doing it the way my last patch does it: if there is some client side code that is building a form dynamically, and includes a CSRF hidden token by using the CSRF cookie value, they might still be vulnerable to the XSS attack addressed in [13698]. This way, the cookie value is sanitised and they would not be vulnerable. (This is probably an unlikely attack, though).

comment:7 Changed 4 years ago by paulswartz

  • Cc pswartz@… added

comment:8 follow-up: Changed 4 years ago by kmtracey

I assume the escape still in the csrftoken tag implementation (http://code.djangoproject.com/browser/django/trunk/django/template/defaulttags.py#L46) is still needed?

There's now an unneeded import of escape in the csrf tests.py file.

For consistency with American spellings uses elsewhere throughout the code it should spelled _sanitize_token....

comment:9 in reply to: ↑ 8 Changed 4 years ago by lukeplant

Replying to kmtracey:

I assume the escape still in the csrftoken tag implementation (http://code.djangoproject.com/browser/django/trunk/django/template/defaulttags.py#L46) is still needed?

Actually, no, although it doesn't harm having it there. For consistency I guess it should be removed.

There's now an unneeded import of escape in the csrf tests.py file.

For consistency with American spellings uses elsewhere throughout the code it should spelled _sanitize_token....

Good catches.

Changed 4 years ago by lukeplant

Updates for Karen's comments.

comment:10 Changed 4 years ago by mtredinnick

I can't see any immediate problem here, but we were discussing this over dinner (Karen, myself, Russ, ...) and the potential problem corner that occurred to me is this: if somebody manages to get a corrupted/changed cookie to the point that this sanitization removes *all* characters, will that cause some kind of unpleasant error? I suspect not, but I'm raising it for thought for a few hours (late at night here).

comment:11 Changed 4 years ago by lukeplant

For an empty token, the result would currently be that the {% csrf_token %} tag would output nothing, and subsequent form submission would fail.

The scenarios I can think of here, are 1) that a subdomain could have set the CSRF cookie, presumably for malicious purposes, or 2) that some one has just messed around with their own cookie. In the first case, we have a known attack that we can't prevent, but that attack might be dealt with by the owner of the subdomain being sorted out. So in either case it would be nicer if the user wasn't stuck with a dodgy cookie that made them unable to submit forms. So, I'll add a condition to cater for that.

Changed 4 years ago by lukeplant

Added condition for empty cookie

comment:12 Changed 4 years ago by mtredinnick

Looks good to me. James is around at the sprints (in Portland) today, as well as myself and Karen, so we can roll out a fix release today (including the packaging problem fix as well).

comment:13 Changed 4 years ago by mtredinnick

  • Component changed from Uncategorized to Forms

comment:14 Changed 4 years ago by lukeplant

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [13732]) Fixed #14235 - UnicodeDecodeError in CSRF middleware

Thanks to jbg for the report.

This changeset essentially backs out [13698] in favour of a method that
sanitizes the token rather than escaping it.

comment:15 Changed 4 years ago by lukeplant

(In [13733]) [1.2.X] Fixed #14235 - UnicodeDecodeError in CSRF middleware

Thanks to jbg for the report.

This changeset essentially backs out [13698] in favour of a method that
sanitizes the token rather than escaping it.

Backport of [13732] from trunk.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.