#15152 closed Bug (fixed)
Common middleware raises UnicodeDecodeError if receives non-ASCII QUERY_STRING from buggy web server
| Reported by: | Loststylus | Owned by: | Aymeric Augustin |
|---|---|---|---|
| Component: | Core (Other) | Version: | 1.2 |
| Severity: | Normal | Keywords: | common middleware |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
So, FlaxCrawler seems to like my site and visits it very often always getting a 500 error.
Here's the common traceback:
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/django/core/handlers/base.py", line 80, in get_response
response = middleware_method(request)
File "/usr/local/lib/python2.6/dist-packages/django/middleware/common.py", line 79, in process_request
newurl += '?' + request.META['QUERY_STRING']
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 3: ordinal not in range(128)
<WSGIRequest
GET:<QueryDict: {u'q': [u'\u0427\u0430\u0439\u043a\u0430']}>,
POST:<QueryDict: {}>,
COOKIES:{},
META:{'CONTENT_LENGTH': '',
'CONTENT_TYPE': '',
'HTTP_ACCEPT': 'text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2',
'HTTP_ACCEPT_ENCODING': 'gzip,defalte',
'HTTP_ACCEPT_LANGUAGE': 'ru,en-us;q=0.7,en;q=0.3',
'HTTP_CACHE_CONTROL': 'no-cache',
'HTTP_CONNECTION': 'close',
'HTTP_HOST': '{sorry, i've got that censored out}',
'HTTP_PRAGMA': 'no-cache',
'HTTP_USER_AGENT': 'FlaxCrawler/1.0',
'PATH_INFO': u'/articles/ajaxsearch',
'QUERY_STRING': 'q=\xd0\xa7\xd0\xb0\xd0\xb9\xd0\xba\xd0\xb0',
'REMOTE_ADDR': '92.241.173.132',
'REQUEST_METHOD': 'GET',
'SCRIPT_NAME': u'',
'SERVER_NAME': '{sorry, i've got that censored out}',
'SERVER_PORT': '80',
'SERVER_PROTOCOL': 'HTTP/1.1',
'wsgi.errors': <flup.server.fcgi_base.TeeOutputStream object at 0x2634850>,
'wsgi.input': <flup.server.fcgi_base.InputStream object at 0x2634610>,
'wsgi.multiprocess': True,
'wsgi.multithread': False,
'wsgi.run_once': False,
'wsgi.url_scheme': 'http',
'wsgi.version': (1, 0)}>
The major problem i see here is that developer cannot do anything to catch the error :(
Change History (18)
comment:2 by , 15 years ago
comment:3 by , 15 years ago
Oh, thank you for your response, i'll try to catch it via middleware.
I think the WSGIRequest constructor seems like the proper place to check for that.
comment:4 by , 15 years ago
Temporary workaround (middleware should be added befor common middleware):
class RequestCheckMiddleware(object): def process_request(self, request): try: u'%s' % request.META.get('QUERY_STRING','') except UnicodeDecodeError: response = HttpResponse() response.status_code = 400 #Bad Request return response return None
comment:5 by , 15 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
Accepted on the basis that we could do something here, but I agree with Luke - we don't want to pay a big price because a handful of servers can't implement the spec correctly.
comment:6 by , 15 years ago
| Summary: | Common middleware raises UnicodeDecodeError if receives specially formatted unicode-like query string → Common middleware raises UnicodeDecodeError if receives non-ASCII QUERY_STRING from buggy web server |
|---|
comment:7 by , 15 years ago
| Severity: | → Normal |
|---|---|
| Type: | → Bug |
comment:13 by , 14 years ago
I got the same error.
My server is Apache with mod_wsgi. The client seems to be Internet Explorer 9.
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/Django-1.3-py2.7.egg/django/core/handlers/base.py", line 89, in get_response
response = middleware_method(request)
File "/usr/local/lib/python2.7/dist-packages/Django-1.3-py2.7.egg/django/middleware/common.py", line 89, in process_request
newurl += '?' + request.META['QUERY_STRING']
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 58: ordinal not in range(128)
<WSGIRequest
GET:<QueryDict: {u'datetime': [u'--------------'], u'time_group_id': [u'---'], u'speciality': [u'Psicologia (Dist\xfarbios emocionais e de personalidade)'], u'office': [u'--------------------------------------------'], u'health_insurance': [u'----------']}>,
POST:<QueryDict: {}>,
META:{
'GATEWAY_INTERFACE': 'CGI/1.1',
'HTTP_ACCEPT': 'text/html, application/xhtml+xml, */*',
'HTTP_ACCEPT_LANGUAGE': 'pt-BR',
'HTTP_CONNECTION': 'Keep-Alive',
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)',
'HTTP_VIA': '1.1 SVGWF02',
'QUERY_STRING': 'health_insurance=----------&speciality=Psicologia%20(Dist\xc3\xbarbios%20emocionais%20e%20de%20personalidade)&office=--------------------------------------------&time_group_id=---&datetime=----------------',
'REQUEST_METHOD': 'GET',
'SERVER_PROTOCOL': 'HTTP/1.1',
'SERVER_SOFTWARE': 'Apache',
'mod_wsgi.callable_object': 'application',
'mod_wsgi.handler_script': '',
'mod_wsgi.input_chunked': '0',
'mod_wsgi.listener_host': '',
'mod_wsgi.process_group': '',
'mod_wsgi.request_handler': 'wsgi-script',
'mod_wsgi.script_reloading': '1',
'mod_wsgi.version': (3, 3),
'wsgi.errors': <mod_wsgi.Log object at 0xba62c0c0>,
'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0xba4b2608>,
'wsgi.input': <mod_wsgi.Input object at 0xba41cd90>,
'wsgi.multiprocess': True,
'wsgi.multithread': False,
'wsgi.run_once': False,
'wsgi.url_scheme': 'http',
'wsgi.version': (1, 1)}>
comment:14 by , 14 years ago
I have same error
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/django/core/handlers/base.py", line 89, in get_response
response = middleware_method(request)
File "/usr/lib/python2.6/site-packages/django/middleware/common.py", line 89, in process_request
newurl += '?' + request.META['QUERY_STRING']
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb0 in position 4: ordinal not in range(128)
Some additional data
HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)', 'HTTP_VIA': '1.0 niiri.kharkov.com (squid/3.0.STABLE7), 1.0 wwwniiri (squid/3.2.0.12)', 'HTTP_X_FORWARDED_FOR': 'unknown, 172.16.0.4, 82.117.230.71', 'PATH_INFO': u'/ru/products/tag/N', 'QUERY_STRING': 'N??\xb0???\xbb??????/fancybox/fancy_loading.png', 'REMOTE_PORT': '27370', 'REQUEST_METHOD': 'GET', 'REQUEST_URI': '/ru/products/tag/N?N??\xb0???\xbb??????/fancybox/fancy_loading.png',
comment:15 by , 14 years ago
Same recurrent error here, just after updating to 1.4 :
Traceback (most recent call last):
File "/usr/lib/python2.7/wsgiref/handlers.py", line 85, in run
self.result = application(self.environ, self.start_response)
File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/contrib/staticfiles/handlers.py", line 67, in __call__
return self.application(environ, start_response)
File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/core/handlers/wsgi.py", line 241, in __call__
response = self.get_response(request)
File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/core/handlers/base.py", line 146, in get_response
response = debug.technical_404_response(request, e)
File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/views/debug.py", line 432, in technical_404_response
'reason': smart_str(exception, errors='replace'),
File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/utils/encoding.py", line 116, in smart_str
return str(s)
File "/usr/lib/python2.7/site-packages/Django-1.4-py2.7.egg/django/core/urlresolvers.py", line 185, in __repr__
return smart_str(u'<%s %s %s>' % (self.__class__.__name__, self.name, self.regex.pattern))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
I tried add utf8 spec at the top of urlresolver + add DEFAULT utf8 at settings.py => no effect.
comment:16 by , 13 years ago
| Owner: | changed from to |
|---|
comment:17 by , 13 years ago
Apache + mod_wsgi and the Bingbot is causing this error on one of my servers.
comment:18 by , 13 years ago
This happens because the URL produced by reversing is a unicode string and the query string is a bytestring.
(Interestingly, this bug doesn't exist under Python 3.)
comment:19 by , 13 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
There is no "specially formatted unicode-like" query string here - it is a straightforward UTF-8 encoded string.
The strange thing here is that non-ASCII characters are ending up in
META['QUERY_STRING']. With browsers, non-ASCII characters get percent encoded. So the request is simply wrong - this is definitely a bug the crawler (but that is irrelevant).The next question is whether this is a bug in the web server, which appears to be flup. Looking at the spec for QUERY_STRING in CGI (http://ken.coar.org/cgi/draft-coar-cgi-v11-03.txt) which is the basis of the WSGI spec (http://www.python.org/dev/peps/pep-0333/#environ-variables), the value of QUERY_STRING should not contain these values.
So AFAICS, this is a bug in flup, because it should never be passing on values like these. That doesn't mean we shouldn't fix it in Django to stop 500 errors being produced. The best behaviour would be to return a '400 Malformed request' error if QUERY_STRING has any non-ascii chars, but we probably don't want to do that in the bit of code that is raising this exception, but somewhere like
WSGIRequest.__init__orBaseHandler.get_response. But this will add overhead to every request, so I'm not sure what to do.There is a way to catch this at the developer level - install an exception middleware. You could also install a request middleware that checked that no invalid chars were in QUERY_STRING.