Opened 10 years ago

Closed 10 years ago

#20356 closed Bug (fixed)

CommonMiddleware UnicodeDecodeError

Reported by: srusskih Owned by: nobody
Component: HTTP handling Version: 1.3
Severity: Normal Keywords: middleware unicodedecodeerror
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


Got the mail with a bug:

Traceback (most recent call last):

 File "/srv/mydomain/mydomain/django/core/handlers/", line 178, in get_response
 response = middleware_method(request, response)

 File "/srv/mydomain/mydomain/django/middleware/", line 107, in process_response
 % (referer, request.get_full_path(), ua, ip),

UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 35: ordinal not in range(128)

GET:<QueryDict: {}>,
POST:<QueryDict: {}>,
META:{'CSRF_COOKIE': 'db9ed773e630c8234d67e01c3df53ac5',
 'DOCUMENT_ROOT': '/htdocs',
 'HTTP_ACCEPT': '*/*, text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
 'HTTP_ACCEPT_CHARSET': 'windows-1251,utf-8;q=0.7,*;q=0.7',
 'HTTP_ACCEPT_ENCODING': 'identity',
 'HTTP_ACCEPT_LANGUAGE': 'ru-ru,ru;q=0.8,en-us;q=0.5,en;q=0.3',
 'HTTP_CONNECTION': 'close',
 'HTTP_HOST': '',
 'HTTP_REFERER': '\xd0\xbb\xd0\xb8',
 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:11.0) Gecko/20100101 Firefox/11.0',
 'HTTP_X_REAL_IP': '',
 'PATH': '/usr/local/bin:/usr/bin:/bin',
 'PATH_INFO': u'/c/\u043b\u0438/',
 'PATH_TRANSLATED': '/srv/mydomain/conf/run.wsgi/c/\xd0\xbb\xd0\xb8/',
 'REMOTE_PORT': '59661',
 'REQUEST_URI': '/c/\xd0\xbb\xd0\xb8/',
 'SCRIPT_FILENAME': '/srv/mydomain/conf/run.wsgi',
 'SCRIPT_NAME': u'',
 'SERVER_ADMIN': '[no address given]',
 'SERVER_PORT': '80',
 'SERVER_SIGNATURE': '<address>Apache/2.2.14 (Ubuntu) Server at Port 80</address>\n',
 'SERVER_SOFTWARE': 'Apache/2.2.14 (Ubuntu)',
 'mod_wsgi.application_group': '|',
 'mod_wsgi.callable_object': 'application',
 'mod_wsgi.listener_host': '',
 'mod_wsgi.listener_port': '8080',
 'mod_wsgi.process_group': 'mydomain',
 'mod_wsgi.reload_mechanism': '1',
 'mod_wsgi.script_reloading': '1',
 'mod_wsgi.version': (2, 8),
 'wsgi.errors': <mod_wsgi.Log object at 0xacf04920>,
 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0xaf4a31d0>,
 'wsgi.input': <mod_wsgi.Input object at 0xba09b9d0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}>

Testcase to reproduce:

import mock
from django.test import RequestFactory, TestCase
from django.http import HttpResponse
from django.middleware.common import CommonMiddleware

class TestDjangoMiddlewareUnicodeError(TestCase):

    def test_unicodedecode_error_for_unicode_characters_in_path(self, settings):
        settings.DEBUG = False
        settings.SEND_BROKEN_LINK_EMAILS = True

        request = RequestFactory().get(u'/c/\u043b\u0438/')
        request.META['HTTP_REFERER'] = 'http://testserver/c/\xd0\xbb\xd0\xb8/'
        response = HttpResponse(status=404)

        CommonMiddleware().process_response(request, response)

Attachments (1)

20356-1.diff (2.2 KB) - added by Claude Paroz 10 years ago.

Download all attachments as: .zip

Change History (4)

comment:1 Changed 10 years ago by Claude Paroz

Component: UncategorizedHTTP handling
Has patch: set
Triage Stage: UnreviewedAccepted

Changed 10 years ago by Claude Paroz

Attachment: 20356-1.diff added

comment:2 Changed 10 years ago by Aymeric Augustin

Triage Stage: AcceptedReady for checkin

Technically, URLs may be using any encoding, even though modern RFCs require utf-8.

If a site whose URLs are in latin-1 links to a Django site, this problem will occur when attempting to decode the URL as utf-8.

There's a bunch of workarounds at this point, errors=replace sounds all right, displaying the raw value would work too.

comment:3 Changed 10 years ago by Claude Paroz <claude@…>

Resolution: fixed
Status: newclosed

In 8fd44b2551b9cca765b216a31306f9c6935f1492:

Fixed #20356 -- Prevented crash when HTTP_REFERER contains non-ascii

Thanks srusskih for the report and Aymeric Augustin for the review.

Note: See TracTickets for help on using tickets.
Back to Top