Opened 14 months ago

Closed 13 months ago

Last modified 13 months ago

#22996 closed Bug (fixed)

UnicodeDecodeError on accessing `request.GET`

Reported by: jorgecarleitao Owned by: nobody
Component: HTTP handling Version: 1.6
Severity: Normal Keywords:
Cc: jorgecarleitao Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by aaugustin)

I'm getting a non-deterministic error while running Django 1.6.5 in production, when I try to access request.GET:

    Traceback (most recent call last):
    
      File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response
        response = wrapped_callback(request, *callback_args, **callback_kwargs)
    
      File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted
        context = build_costumer_list_context(context, request.GET)
    
      File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get
        raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8')

from a request of the form:

    'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3'

I'm using one middleware, 'django.middleware.locale.LocaleMiddleware' and página is a translation.

This error occurs ~1 every 200 pageviews (estimated), and it seems to occur only on requests with gets of the form ?página=....

I will gladly help on this, although I'm not familiar with HTTP handling, thus I would need some guidance on what could be and where I should start looking.

Attachments (2)

22996-1.6.diff (2.1 KB) - added by claudep 14 months ago.
22996-master.diff (2.2 KB) - added by claudep 14 months ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 14 months ago by aaugustin

  • Description modified (diff)
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Could you provide the full stack trace please?

comment:2 Changed 14 months ago by jorgecarleitao

Thanks for formatting it Aymeric.

This is all I have in the email I receive with the traceback (removed 2 informations that could help to identify the user):

Traceback (most recent call last):

  File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/base.py", line 114, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)

  File "/home/jorgecarleitao/webapps/publics/public-contracts/contracts/category_views.py", line 79, in contracted
    context = build_costumer_list_context(context, request.GET)

  File "/home/jorgecarleitao/webapps/publics/lib/python3.3/django/core/handlers/wsgi.py", line 137, in _get_get
    raw_query_string = raw_query_string.encode('iso-8859-1').decode('utf-8')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 1: invalid continuation byte


<WSGIRequest
path:/categoria/8696/contratados,
GET:<could not parse>,
POST:<QueryDict: {}>,
COOKIES:{'_ga': 'GA1.2.235397185.1404980740'},
META:{'DOCUMENT_ROOT': '/usr/local/apache2/htdocs',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTP_ACCEPT': 'image/gif, image/jpeg, image/pjpeg, image/pjpeg, application/x-shockwave-flash, application/x-ms-application, application/x-ms-xbap, application/vnd.ms-xpsdocument, application/xaml+xml, */*',
 'HTTP_ACCEPT_ENCODING': 'gzip, deflate',
 'HTTP_ACCEPT_LANGUAGE': 'pt',
 'HTTP_CACHE_CONTROL': 'max-age=0',
 'HTTP_CONNECTION': 'close',
 'HTTP_COOKIE': '_ga=GA1.2.235397185.1404980740',
 'HTTP_FORWARDED_REQUEST_URI': '/categoria/8696/contratados?página=3',
 'HTTP_HOST': 'publicos.pt',
 'HTTP_HTTPS': 'off',
 'HTTP_HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_USER_AGENT': ------------------------
 'HTTP_VIA': '1.1 cmaGD.cma.local (squid/3.3.8)',
 'HTTP_X_FORWARDED_FOR': ---------------------
 'HTTP_X_FORWARDED_HOST': 'publicos.pt',
 'HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_X_FORWARDED_SERVER': 'publicos.pt',
 'HTTP_X_FORWARDED_SSL': 'off',
 'PATH_INFO': '/categoria/8696/contratados',
 'PATH_TRANSLATED': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py/categoria/8696/contratados',
 'QUERY_STRING': 'página=3',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': '33908',
 'REQUEST_METHOD': 'GET',
 'REQUEST_URI': '/categoria/8696/contratados?página=3',
 'SCRIPT_FILENAME': '/home/jorgecarleitao/webapps/publics/public-contracts/main/apache/wsgi.py',
 'SCRIPT_NAME': '',
 'SERVER_ADDR': '127.0.0.1',
 'SERVER_ADMIN': '[no address given]',
 'SERVER_NAME': 'publicos.pt',
 'SERVER_PORT': '80',
 'SERVER_PROTOCOL': 'HTTP/1.0',
 'SERVER_SIGNATURE': '',
 'SERVER_SOFTWARE': 'Apache/2.2.25 (Unix) mod_wsgi/3.4 Python/3.3.2',
 'mod_wsgi.application_group': 'web306.webfaction.com|',
 'mod_wsgi.callable_object': 'application',
 'mod_wsgi.enable_sendfile': '0',
 'mod_wsgi.handler_script': '',
 'mod_wsgi.input_chunked': '0',
 'mod_wsgi.listener_host': '',
 'mod_wsgi.listener_port': '10392',
 'mod_wsgi.process_group': 'publics',
 'mod_wsgi.queue_start': '1405000563669244',
 'mod_wsgi.request_handler': 'wsgi-script',
 'mod_wsgi.script_reloading': '1',
 'mod_wsgi.version': (3, 4),
 'wsgi.errors': <_io.TextIOWrapper encoding='utf-8'>,
 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7fc0566d5828>,
 'wsgi.input': <mod_wsgi.Input object at 0x7fc05c0586b0>,
 'wsgi.multiprocess': True,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}>

comment:3 Changed 14 months ago by aaugustin

Clearly the exception happens because the query string contains a á encoded as latin-1 while Django expect non-ASCII characters in the URL to be encoded in UTF-8.

>>> b'\xe1'.decode('latin-1')
'á'

If I understand correctly, you cannot reproduce this reliably? Maybe it's an old and buggy browser? If you have the web server's log, can you look for the 500 error and check the user agent?

comment:4 Changed 14 months ago by jorgecarleitao

Since I had old error emails, some examples of user agents where this happened:

'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C; .NET CLR 2.0.50727)',
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0; MATMJS)' (4 times)
'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)',
'HTTP_USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727)',
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko',

In the last error, the server log (accesses) shows the first user-agent above on the access with status 500.

comment:5 Changed 14 months ago by claudep

This seems to be typical of IE which messes URL encoding.
From http://blogs.msdn.com/b/ieinternals/archive/2012/07/13/internet-explorer-and-international-text-encoding-unicode-punycode-ansi-oh-my.aspx:

"URLs in IE may use up to three (!!) different encodings at once: punycode in the hostname, %-escaped UTF-8 for the path, and raw codepaged-ANSI for the query and fragment components. This is clearly a mess, but fixing it to match the IRI specification incurs compatibility costs. (Trust me, we’ve tried!)"

While it's unfortunate, we should at least not crash.

comment:6 Changed 14 months ago by timo

  • Triage Stage changed from Unreviewed to Accepted

Changed 14 months ago by claudep

Changed 14 months ago by claudep

comment:7 Changed 14 months ago by claudep

  • Has patch set

comment:8 Changed 14 months ago by aaugustin

Yeah, we have no choice but shoving "in the face of ambiguity, refuse to guess" up our asses. Thank you, IE.

Patches look pretty good. Can you add a comment explaining why the results are different on Python 2 and 3 -- if you know why? (I'm pretty sure I've seen that before but I can't remember the reason.) Can you also add a reference to this ticket (#22996) in the test's docstring?

Last edited 14 months ago by aaugustin (previous) (diff)

comment:9 Changed 14 months ago by claudep

Here is the pull request for master: https://github.com/django/django/pull/2910
For the backport on 1.6, I might limit the changes to the Python 3 issue.

comment:10 Changed 13 months ago by timo

  • Patch needs improvement set

According to the PR, there is a failing test on Python 2.

comment:11 Changed 13 months ago by claudep

  • Patch needs improvement unset

Patch updated, including a note in the 1.7 release notes.

comment:12 Changed 13 months ago by timgraham

  • Triage Stage changed from Accepted to Ready for checkin

comment:13 Changed 13 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In fa02120d360387bebbbe735e86686bb4c7c43db2:

Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.

comment:14 Changed 13 months ago by Claude Paroz <claude@…>

In 72ad014b6aee3e8d996af4646b97228e82fc4cc1:

[1.7.x] Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.
Backport of fa02120d36 from master.

comment:15 Changed 13 months ago by Claude Paroz <claude@…>

In 9f9fdc4b0a33abfe3255302300ea1e3d1c33a3a0:

[1.6.x] Fixed #22996 -- Prevented crash with unencoded query string

Thanks Jorge Carleitao for the report and Aymeric Augustin, Tim Graham
for the reviews.
Backport of fa02120d36 from master.

Note: See TracTickets for help on using tickets.
Back to Top