Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#19101 closed Bug (fixed)

Non ascii chars in form cause Internal Server Error

Reported by: kristall Owned by: aaugustin
Component: Forms Version: master
Severity: Release blocker Keywords: encoding
Cc: kristall Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by claudep)

Trying to use non ascii chars in form cause trouble with python3.2 (same code works fine under python2.7).
I used "python3 manage.py runserver"

Internal Server Error: /formtest/
Traceback (most recent call last):
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 110, in get_response
    response = middleware_method(request, callback, callback_args, callback_kwargs)
  File "/usr/local/lib/python3.2/site-packages/django/middleware/csrf.py", line 174, in process_view
    request_csrf_token = request.POST.get('csrfmiddlewaretoken', '')
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Traceback (most recent call last):
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 110, in get_response
    response = middleware_method(request, callback, callback_args, callback_kwargs)
  File "/usr/local/lib/python3.2/site-packages/django/middleware/csrf.py", line 174, in process_view
    request_csrf_token = request.POST.get('csrfmiddlewaretoken', '')
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.2/wsgiref/handlers.py", line 137, in run
    self.result = application(self.environ, self.start_response)
  File "/usr/local/lib/python3.2/site-packages/django/contrib/staticfiles/handlers.py", line 71, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 236, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 180, in get_response
    response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/base.py", line 222, in handle_uncaught_exception
    return debug.technical_500_response(request, *exc_info)
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 69, in technical_500_response
    html = reporter.get_traceback_html()
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 289, in get_traceback_html
    c = Context(self.get_traceback_data())
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 247, in get_traceback_data
    frames = self.get_traceback_frames()
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 398, in get_traceback_frames
    'vars': self.filter.get_traceback_frame_variables(self.request, tb.tb_frame),
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 197, in get_traceback_frame_variables
    value = self.get_request_repr(value)
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 105, in get_request_repr
    return build_request_repr(request, POST_override=self.get_post_parameters(request))
  File "/usr/local/lib/python3.2/site-packages/django/views/debug.py", line 154, in get_post_parameters
    return request.POST
  File "/usr/local/lib/python3.2/site-packages/django/core/handlers/wsgi.py", line 179, in _get_post
    self._load_post_and_files()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 340, in _load_post_and_files
    self._post, self._files = QueryDict(self.body, encoding=self._encoding), MultiValueDict()
  File "/usr/local/lib/python3.2/site-packages/django/http/__init__.py", line 392, in __init__
    encoding=encoding):
  File "/usr/local/lib/python3.2/urllib/parse.py", line 608, in parse_qsl
    value = _coerce_result(value)
  File "/usr/local/lib/python3.2/urllib/parse.py", line 88, in _encode_result
    return obj.encode(encoding, errors)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Attachments (6)

forms.py (205 bytes) - added by kristall 3 years ago.
The used form
views.py (728 bytes) - added by kristall 3 years ago.
The used view
19101-test.diff (1.4 KB) - added by claudep 3 years ago.
Test failing on Python 3
19101-1.diff (3.4 KB) - added by claudep 3 years ago.
Fixed non-ascii form data decoding with Python 3
19101-2.diff (1.9 KB) - added by claudep 3 years ago.
Updated patch
19101-3.diff (2.5 KB) - added by aaugustin 3 years ago.

Download all attachments as: .zip

Change History (19)

Changed 3 years ago by kristall

The used form

Changed 3 years ago by kristall

The used view

comment:1 Changed 3 years ago by claudep

  • Description modified (diff)
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Severity changed from Normal to Release blocker
  • Triage Stage changed from Unreviewed to Accepted

Confirmed. We missed that because currently in QueryDictTests we always assume that the input is a real string in Python 3, which is True for GET requests, but not for a POST request where the first argument passed to QueryDict is the still-encoded self.body.

Changed 3 years ago by claudep

Test failing on Python 3

comment:2 Changed 3 years ago by claudep

A possible solution would be to call force_str on self.body when passing it to QueryDict, but not before #5611 has been fixed, because we have to be sure that self.body is a x-www-form-urlencoded content type.

comment:3 Changed 3 years ago by kristall

  • Cc kristall added

Changed 3 years ago by claudep

Fixed non-ascii form data decoding with Python 3

comment:4 Changed 3 years ago by claudep

  • Has patch set

In the above patch, I also fixed #5076, as this was needed for the test with latin-1 encoding. This could be committed separately, though.

comment:5 Changed 3 years ago by Architekt

  • Triage Stage changed from Accepted to Ready for checkin

Patch reviewed.

comment:6 Changed 3 years ago by aaugustin

  • Owner changed from nobody to aaugustin

comment:7 Changed 3 years ago by aaugustin

I'd prefer to fork the part that fixes #5076 to that ticket. I left a comment over there.

The force_str when instantiating the QueryDict looks suspect to me — I suppose self.body contains bytes at this point, wouldn't it make sense to decode them with the charset of the request rather than with utf-8?

comment:8 Changed 3 years ago by claudep

Now that we know for sure that the content of the request is x-www-form-urlencoded, it should be encoded and composed of only ASCII chars at this point, AFAIK, so decoding it with 'utf-8' or 'ascii' or even any encoding should not make any difference. The charset specified in the request is then used later in QueryDict initialization to decode (in the sense of url decoding) the query string content.

Changed 3 years ago by claudep

Updated patch

comment:9 Changed 3 years ago by claudep

I've just uploaded a new patch now that #5076 has been committed.

Changed 3 years ago by aaugustin

comment:10 Changed 3 years ago by aaugustin

We can also fix this in QueryDict.__init__. It's a bit more consistent with the Python 2 code. It makes QueryDict more resilient.

comment:11 Changed 3 years ago by claudep

This alternate approach looks good to me. I really think that we should get rid of the errors='replace' behaviour, which is only deferring potential errors later in the stack, but this could be adressed as a whole in #18004.

comment:12 Changed 3 years ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from new to closed

In b99707bdedbcbf832bd88eaefd488c841110a222:

[1.5.x] Fixed #19101 -- Decoding of non-ASCII POST data on Python 3.

Thanks Claude Paroz.

Backport of 095eca8 from master.

comment:13 Changed 3 years ago by Aymeric Augustin <aymeric.augustin@…>

In 095eca8dd85cb27ed0b22829903df10f19cdab6c:

Fixed #19101 -- Decoding of non-ASCII POST data on Python 3.

Thanks Claude Paroz.

Note: See TracTickets for help on using tickets.
Back to Top