Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#22996 closed Bug (fixed)

UnicodeDecodeError on accessing `request.GET`

Reported by: jorgecarleitao Owned by: nobody
Component: HTTP handling Version: 1.6
Severity: Normal Keywords:
Cc: jorgecarleitao Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Aymeric Augustin)

I'm getting a non-deterministic error while running Django 1.6.5 in production, when I try to access request.GET:

    Traceback (most recent call last):
    
      File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response
        response = wrapped_callback(request, *callback_args, **callback_kwargs)
    
      File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted
        context = build_costumer_list_context(context, request.GET)
    
      File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get
        raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8')

from a request of the form:

    'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3'

I'm using one middleware, 'django.middleware.locale.LocaleMiddleware' and página is a translation.

This error occurs ~1 every 200 pageviews (estimated), and it seems to occur only on requests with gets of the form ?página=....

I will gladly help on this, although I'm not familiar with HTTP handling, thus I would need some guidance on what could be and where I should start looking.

Attachments (2)

22996-1.6.diff (2.1 KB) - added by Claude Paroz 2 years ago.
22996-master.diff (2.2 KB) - added by Claude Paroz 2 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 2 years ago by Aymeric Augustin

Description: modified (diff)
Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

Could you provide the full stack trace please?

comment:2 Changed 2 years ago by jorgecarleitao

Thanks for formatting it Aymeric.

This is all I have in the email I receive with the traceback (removed 2 informations that could help to identify the user):

Traceback (most recent call last):

  File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)

  File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted
    context = build_costumer_list_context(context, request.GET)

  File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get
    raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 1: invalid continuation byte


<WSGIRequest
path:/categoria/8696/contratados,
GET:<could not parse>,
POST:<QueryDict: {}>,
COOKIES:{'_ga': 'GA1.2.235397185.1404980740'},
META:{'DOCUMENT_ROOT': '/usr/local/apache2/htdocs',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTP_ACCEPT': 'image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, */*',
 'HTTP_ACCEPT_ENCODING': 'gzip, deflate',
 'HTTP_ACCEPT_LANGUAGE': 'pt',
 'HTTP_CACHE_CONTROL': 'max-age=0',
 'HTTP_CONNECTION': 'close',
 'HTTP_COOKIE': '_ga=GA1.2.235397185.1404980740',
 'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3',
 'HTTP_HOST': 'publicos.pt',
 'HTTP_HTTPS': 'off',
 'HTTP_HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_USER_AGENT': ------------------------
 'HTTP_VIA': '1.1 cmaGD.cma.local (squid/3.3.8)',
 'HTTP_X_FORWARDED_FOR': ---------------------
 'HTTP_X_FORWARDED_HOST': 'publicos.pt',
 'HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_X_FORWARDED_SERVER': 'publicos.pt',
 'HTTP_X_FORWARDED_SSL': 'off',
 'PATH_INFO': '/categoria/8696/contratados',
 'PATH_TRANSLATED': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py/categoria/8696/contratados',
 'QUERY_STRING': 'página=3',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': '33908',
 'REQUEST_METHOD': 'GET',
 'REQUEST_URI': '/categoria/8696/contratados?página=3',
 'SCRIPT_FILENAME': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py',
 'SCRIPT_NAME': '',
 'SERVER_ADDR': '127.0.0.1',
 'SERVER_ADMIN': '[no address given]',
 'SERVER_NAME': 'publicos.pt',
 'SERVER_PORT': '80',
 'SERVER_PROTOCOL': 'HTTP/1.0',
 'SERVER_SIGNATURE': '',
 'SERVER_SOFTWARE': 'Apache/2.2.25 (Unix) mod_wsgi/3.4 Python/3.3.2',
 'mod_wsgi.application_group': 'web306.webfaction.com|',
 'mod_wsgi.callable_object': 'application',
 'mod_wsgi.enable_sendfile': '0',
 'mod_wsgi.handler_script': '',
 'mod_wsgi.input_chunked': '0',
 'mod_wsgi.listener_host': '',
 'mod_wsgi.listener_port': '10392',
 'mod_wsgi.process_group': 'publics',
 'mod_wsgi.queue_start': '1405000563669244',
 'mod_wsgi.request_handler': 'wsgi-script',
 'mod_wsgi.script_reloading': '1',
 'mod_wsgi.version': (3, 4),
 'wsgi.errors': <_io.TextIOWrapper encoding='utf-8'>,
 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7fc0566d5828>,
 'wsgi.input': <mod_wsgi.Input object at 0x7fc05c0586b0>,
 'wsgi.multiprocess': True,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}>

comment:3 Changed 2 years ago by Aymeric Augustin

Clearly the exception happens because the query string contains a á encoded as latin-1 while Django expect non-ASCII characters in the URL to be encoded in UTF-8.

>>> b'\xe1'.decode('latin-1')
'á'

If I understand correctly, you cannot reproduce this reliably? Maybe it's an old and buggy browser? If you have the web server's log, can you look for the 500 error and check the user agent?

comment:4 Changed 2 years ago by jorgecarleitao

Since I had old error emails, some examples of user agents where this happened:

'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET CLR 2.0.50727)',
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; MATMJS)' (4 times)
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727)',
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',

In the last error, the server log (accesses) shows the first user-agent above on the access with status 500.

comment:5 Changed 2 years ago by Claude Paroz

This seems to be typical of IE which messes URL encoding.
From http://blogs.msdn.com/b/ieinternals/archive/2012/07/13/internet-explorer-and-international-text-encoding-unicode-punycode-ansi-oh-my.aspx:

"URLs in IE may use up to three (!!) different encodings at once: punycode in the hostname, %-escaped UTF-8 for the path, and raw codepaged-ANSI for the query and fragment components. This is clearly a mess, but fixing it to match the IRI specification incurs compatibility costs. (Trust me, we’ve tried!)"

While it's unfortunate, we should at least not crash.

comment:6 Changed 2 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

Changed 2 years ago by Claude Paroz

Attachment: 22996-1.6.diff added

Changed 2 years ago by Claude Paroz

Attachment: 22996-master.diff added

comment:7 Changed 2 years ago by Claude Paroz

Has patch: set

comment:8 Changed 2 years ago by Aymeric Augustin

Yeah, we have no choice but shoving "in the face of ambiguity, refuse to guess" up our asses. Thank you, IE.

Patches look pretty good. Can you add a comment explaining why the results are different on Python 2 and 3 -- if you know why? (I'm pretty sure I've seen that before but I can't remember the reason.) Can you also add a reference to this ticket (#22996) in the test's docstring?

Last edited 2 years ago by Aymeric Augustin (previous) (diff)

comment:9 Changed 2 years ago by Claude Paroz

Here is the pull request for master: https://github.com/django/django/pull/2910
For the backport on 1.6, I might limit the changes to the Python 3 issue.

comment:10 Changed 2 years ago by Tim Graham

Patch needs improvement: set

According to the PR, there is a failing test on Python 2.

comment:11 Changed 2 years ago by Claude Paroz

Patch needs improvement: unset

Patch updated, including a note in the 1.7 release notes.

comment:12 Changed 2 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:13 Changed 2 years ago by Claude Paroz <claude@…>

Resolution: fixed
Status: newclosed

In fa02120d360387bebbbe735e86686bb4c7c43db2:

Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.

comment:14 Changed 2 years ago by Claude Paroz <claude@…>

In 72ad014b6aee3e8d996af4646b97228e82fc4cc1:

[1.7.x] Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.
Backport of fa02120d36 from master.

comment:15 Changed 2 years ago by Claude Paroz <claude@…>

In 9f9fdc4b0a33abfe3255302300ea1e3d1c33a3a0:

[1.6.x] Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.
Backport of fa02120d36 from master.

Note: See TracTickets for help on using tickets.
Back to Top