Opened 11 years ago

Closed 11 years ago

Last modified 11 years ago

#19101 closed Bug (fixed)

Non ascii chars in form cause Internal Server Error

Reported by: kristall Owned by: Aymeric Augustin
Component: Forms Version: dev
Severity: Release blocker Keywords: encoding
Cc: kristall Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Claude Paroz)

Trying to use non ascii chars in form cause trouble with python3.2 (same code works fine under python2.7).
I used "python3 manage.py runserver"

Internal Server Error: /formtest/
Traceback (most recent call last):
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 110, in get_response
    response = middleware_method(request, callback, callback_args, callback_kwargs)
  File "/usr/local/lib/python3.2/site-packages/django/middleware/csrf.py", line 174, in process_view
    request_csrf_token = request.POST.get('csrfmiddlewaretoken', '')
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 110, in get_response
    response = middleware_method(request, callback, callback_args, callback_kwargs)
  File "/usr/local/lib/python3.2/site-packages/django/middleware/csrf.py", line 174, in process_view
    request_csrf_token = request.POST.get('csrfmiddlewaretoken', '')
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.2/wsgiref/handlers.py", line 137, in run
    self.result = application(self.environ, self.start_response)
  File "/usr/local/lib/python3.2/site-packages/django/contrib/staticfiles/handlers.py", line 71, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 236, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 180, in get_response
    response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 222, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 69, in technical_500_response
    html = reporter.get_traceback_html()
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 289, in get_traceback_html
    c = Context(self.get_traceback_data())
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 247, in get_traceback_data
    frames = self.get_traceback_frames()
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 398, in get_traceback_frames
    'vars': self.filter.get_traceback_frame_variables(self.request, tb.tb_frame),
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 197, in get_traceback_frame_variables
    value = self.get_request_repr(value)
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 105, in get_request_repr
    return build_request_repr(request, POST_override=self.get_post_parameters(request))
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 154, in get_post_parameters
    return request.POST
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Attachments (6)

forms.py (205 bytes) - added by kristall 11 years ago.
The used form
views.py (728 bytes) - added by kristall 11 years ago.
The used view
19101-test.diff (1.4 KB) - added by Claude Paroz 11 years ago.
Test failing on Python 3
19101-1.diff (3.4 KB) - added by Claude Paroz 11 years ago.
Fixed non-ascii form data decoding with Python 3
19101-2.diff (1.9 KB) - added by Claude Paroz 11 years ago.
Updated patch
19101-3.diff (2.5 KB) - added by Aymeric Augustin 11 years ago.

Download all attachments as: .zip

Change History (19)

Changed 11 years ago by kristall

Attachment: forms.py added

The used form

Changed 11 years ago by kristall

Attachment: views.py added

The used view

comment:1 Changed 11 years ago by Claude Paroz

Description: modified (diff)
Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Confirmed. We missed that because currently in QueryDictTests we always assume that the input is a real string in Python 3, which is True for GET requests, but not for a POST request where the first argument passed to QueryDict is the still-encoded self.body.

Changed 11 years ago by Claude Paroz

Attachment: 19101-test.diff added

Test failing on Python 3

comment:2 Changed 11 years ago by Claude Paroz

A possible solution would be to call force_str on self.body when passing it to QueryDict, but not before #5611 has been fixed, because we have to be sure that self.body is a x-www-form-urlencoded content type.

comment:3 Changed 11 years ago by kristall

Cc: kristall added

Changed 11 years ago by Claude Paroz

Attachment: 19101-1.diff added

Fixed non-ascii form data decoding with Python 3

comment:4 Changed 11 years ago by Claude Paroz

Has patch: set

In the above patch, I also fixed #5076, as this was needed for the test with latin-1 encoding. This could be committed separately, though.

comment:5 Changed 11 years ago by Jan Bednařík

Triage Stage: AcceptedReady for checkin

Patch reviewed.

comment:6 Changed 11 years ago by Aymeric Augustin

Owner: changed from nobody to Aymeric Augustin

comment:7 Changed 11 years ago by Aymeric Augustin

I'd prefer to fork the part that fixes #5076 to that ticket. I left a comment over there.

The force_str when instantiating the QueryDict looks suspect to me — I suppose self.body contains bytes at this point, wouldn't it make sense to decode them with the charset of the request rather than with utf-8?

comment:8 Changed 11 years ago by Claude Paroz

Now that we know for sure that the content of the request is x-www-form-urlencoded, it should be encoded and composed of only ASCII chars at this point, AFAIK, so decoding it with 'utf-8' or 'ascii' or even any encoding should not make any difference. The charset specified in the request is then used later in QueryDict initialization to decode (in the sense of url decoding) the query string content.

Changed 11 years ago by Claude Paroz

Attachment: 19101-2.diff added

Updated patch

comment:9 Changed 11 years ago by Claude Paroz

I've just uploaded a new patch now that #5076 has been committed.

Changed 11 years ago by Aymeric Augustin

Attachment: 19101-3.diff added

comment:10 Changed 11 years ago by Aymeric Augustin

We can also fix this in QueryDict.__init__. It's a bit more consistent with the Python 2 code. It makes QueryDict more resilient.

comment:11 Changed 11 years ago by Claude Paroz

This alternate approach looks good to me. I really think that we should get rid of the errors='replace' behaviour, which is only deferring potential errors later in the stack, but this could be adressed as a whole in #18004.

comment:12 Changed 11 years ago by Aymeric Augustin <aymeric.augustin@…>

Resolution: fixed
Status: newclosed

In b99707bdedbcbf832bd88eaefd488c841110a222:

[1.5.x] Fixed #19101 -- Decoding of non-ASCII POST data on Python 3.

Thanks Claude Paroz.

Backport of 095eca8 from master.

comment:13 Changed 11 years ago by Aymeric Augustin <aymeric.augustin@…>

In 095eca8dd85cb27ed0b22829903df10f19cdab6c:

Fixed #19101 -- Decoding of non-ASCII POST data on Python 3.

Thanks Claude Paroz.

Note: See TracTickets for help on using tickets.
Back to Top