Opened 8 years ago

Closed 8 years ago

#26971 closed Bug (fixed)

UnicodeDecodeError with non-ASCII string in quoted URL

Reported by: Oleg Blinov Owned by: nobody
Component: HTTP handling Version: 1.8
Severity: Normal Keywords: UnicodeDecodeError UTF-8 windows-1251 URL wsgi
Cc: loic84 Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django raises UnicodeDecodeError if there are non UTF-8 characters in the url.

https://github.com/django/django/blob/master/django/core/handlers/wsgi.py#L190:

return path_info.decode(UTF_8)

It doesn't work if the parameter in the URL is not in UTF-8 /tag/%E7%E0%EA%EB%E0%E4%EA%E0/:

GET /tag/%E7%E0%EA%EB%E0%E4%EA%E0/ => generated 0 bytes in 1 msecs (HTTP/1.1 400) 1 headers in 68 bytes (1 switches on core 0)
Bad Request (UnicodeDecodeError)
Traceback (most recent call last):
  File "/home/ubuntu/django/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 167, in __call__
    request = self.request_class(environ)
  File "/home/ubuntu/django/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 80, in __init__
    path_info = get_path_info(environ)
  File "/home/ubuntu/django/lib/python3.4/site-packages/django/core/handlers/wsgi.py", line 197, in get_path_info
    return path_info.decode(UTF_8)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 5: invalid continuation byte

With utf url-quoted parameter /tag/%D0%B7%D0%B0%D0%BA%D0%BB%D0%B0%D0%B4%D0%BA%D0%B0 there is no errors, but the old site has used windows-1251 encoding and I need to support old links. So I use this dirty hack:

try:
        return path_info.decode(UTF_8)
except:
        return path_info.decode(windows-1251)

The problem is only in wsgi handler, manage.py runserver handles non-utf urls without errors.

Change History (6)

comment:1 by Claude Paroz, 8 years ago

This was supposed to be fixed by #19508 (hence the runserver not failing).

However, I suspect that in your production deployment, the received URI is already percent-decoded higher in the stack (Apache, mod_wsgi,...), so Django is receiving /tag/\xe7\xe0\xea\xeb\xe0\xe4\xea\xe0/ instead of /tag/%E7%E0%EA%EB%E0%E4%EA%E0/. In that case, we may try to "repercent" the URI in case of UnicodeDecodeError.

Loïc, could you advise?

comment:2 by Claude Paroz, 8 years ago

Cc: loic84 added

comment:3 by Tim Graham, 8 years ago

Triage Stage: UnreviewedAccepted

Not sure about the appropriate resolution, but I could reproduce this crash by trying to fetch a URL like /tag/%E7%E0%EA%EB%E0%E4%EA%E0/ using gunicorn as the server.

comment:4 by Claude Paroz, 8 years ago

Has patch: set

Suggested fix in that PR.

comment:5 by Loic Bistuer, 8 years ago

Triage Stage: AcceptedReady for checkin

comment:6 by Claude Paroz <claude@…>, 8 years ago

Resolution: fixed
Status: newclosed

In 48c34f3:

Fixed #26971 -- Prevented crash with non-UTF-8 incoming PATH_INFO

Thanks Tim Graham and Loïc Bistuer for the reviews.

Note: See TracTickets for help on using tickets.
Back to Top