Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#19101 closed Bug (fixed)

Non ascii chars in form cause Internal Server Error

Reported by: kristall Owned by: Aymeric Augustin
Component: Forms Version: dev
Severity: Release blocker Keywords: encoding
Cc: kristall Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Claude Paroz)

Trying to use non ascii chars in form cause trouble with python3.2 (same code works fine under python2.7).
I used "python3 manage.py runserver"

Internal Server Error: /formtest/
Traceback (most recent call last):
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 110, in get_response
    response = middleware_method(request, callback, callback_args, callback_kwargs)
  File "/usr/local/lib/python3.2/site-packages/django/middleware/csrf.py", line 174, in process_view
    request_csrf_token = request.POST.get('csrfmiddlewaretoken', '')
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 110, in get_response
    response = middleware_method(request, callback, callback_args, callback_kwargs)
  File "/usr/local/lib/python3.2/site-packages/django/middleware/csrf.py", line 174, in process_view
    request_csrf_token = request.POST.get('csrfmiddlewaretoken', '')
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.2/wsgiref/handlers.py", line 137, in run
    self.result = application(self.environ, self.start_response)
  File "/usr/local/lib/python3.2/site-packages/django/contrib/staticfiles/handlers.py", line 71, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 236, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 180, in get_response
    response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 222, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 69, in technical_500_response
    html = reporter.get_traceback_html()
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 289, in get_traceback_html
    c = Context(self.get_traceback_data())
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 247, in get_traceback_data
    frames = self.get_traceback_frames()
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 398, in get_traceback_frames
    'vars': self.filter.get_traceback_frame_variables(self.request, tb.tb_frame),
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 197, in get_traceback_frame_variables
    value = self.get_request_repr(value)
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 105, in get_request_repr
    return build_request_repr(request, POST_override=self.get_post_parameters(request))
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 154, in get_post_parameters
    return request.POST
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Attachments (6)

forms.py (205 bytes ) - added by kristall 12 years ago.
The used form
views.py (728 bytes ) - added by kristall 12 years ago.
The used view
19101-test.diff (1.4 KB ) - added by Claude Paroz 12 years ago.
Test failing on Python 3
19101-1.diff (3.4 KB ) - added by Claude Paroz 12 years ago.
Fixed non-ascii form data decoding with Python 3
19101-2.diff (1.9 KB ) - added by Claude Paroz 12 years ago.
Updated patch
19101-3.diff (2.5 KB ) - added by Aymeric Augustin 12 years ago.

Download all attachments as: .zip

Change History (19)

by kristall, 12 years ago

Attachment: forms.py added

The used form

by kristall, 12 years ago

Attachment: views.py added

The used view

comment:1 by Claude Paroz, 12 years ago

Description: modified (diff)
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Confirmed. We missed that because currently in QueryDictTests we always assume that the input is a real string in Python 3, which is True for GET requests, but not for a POST request where the first argument passed to QueryDict is the still-encoded self.body.

by Claude Paroz, 12 years ago

Attachment: 19101-test.diff added

Test failing on Python 3

comment:2 by Claude Paroz, 12 years ago

A possible solution would be to call force_str on self.body when passing it to QueryDict, but not before #5611 has been fixed, because we have to be sure that self.body is a x-www-form-urlencoded content type.

comment:3 by kristall, 12 years ago

Cc: kristall added

by Claude Paroz, 12 years ago

Attachment: 19101-1.diff added

Fixed non-ascii form data decoding with Python 3

comment:4 by Claude Paroz, 12 years ago

Has patch: set

In the above patch, I also fixed #5076, as this was needed for the test with latin-1 encoding. This could be committed separately, though.

comment:5 by Jan Bednařík, 12 years ago

Triage Stage: AcceptedReady for checkin

Patch reviewed.

comment:6 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin

comment:7 by Aymeric Augustin, 12 years ago

I'd prefer to fork the part that fixes #5076 to that ticket. I left a comment over there.

The force_str when instantiating the QueryDict looks suspect to me — I suppose self.body contains bytes at this point, wouldn't it make sense to decode them with the charset of the request rather than with utf-8?

comment:8 by Claude Paroz, 12 years ago

Now that we know for sure that the content of the request is x-www-form-urlencoded, it should be encoded and composed of only ASCII chars at this point, AFAIK, so decoding it with 'utf-8' or 'ascii' or even any encoding should not make any difference. The charset specified in the request is then used later in QueryDict initialization to decode (in the sense of url decoding) the query string content.

by Claude Paroz, 12 years ago

Attachment: 19101-2.diff added

Updated patch

comment:9 by Claude Paroz, 12 years ago

I've just uploaded a new patch now that #5076 has been committed.

by Aymeric Augustin, 12 years ago

Attachment: 19101-3.diff added

comment:10 by Aymeric Augustin, 12 years ago

We can also fix this in QueryDict.__init__. It's a bit more consistent with the Python 2 code. It makes QueryDict more resilient.

comment:11 by Claude Paroz, 12 years ago

This alternate approach looks good to me. I really think that we should get rid of the errors='replace' behaviour, which is only deferring potential errors later in the stack, but this could be adressed as a whole in #18004.

comment:12 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

Resolution: fixed
Status: newclosed

In b99707bdedbcbf832bd88eaefd488c841110a222:

[1.5.x] Fixed #19101 -- Decoding of non-ASCII POST data on Python 3.

Thanks Claude Paroz.

Backport of 095eca8 from master.

comment:13 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

In 095eca8dd85cb27ed0b22829903df10f19cdab6c:

Fixed #19101 -- Decoding of non-ASCII POST data on Python 3.

Thanks Claude Paroz.

Note: See TracTickets for help on using tickets.
Back to Top