Opened 7 years ago
Last modified 7 years ago
#29323 closed Bug
HTTPRequest QueryDict wrongly decodes binary encoded values in python2 — at Initial Version
Reported by: | Thomas Riccardi | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | Claude Paroz | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
application/x-www-form-urlencoded
bodies allow binary values: just percent-escape (most of) the bytes.
It doesn't work with python2 and django.
In python2, QueryDict returns an unicode
string for values, which makes it impossible to represent binary data.
Worse: it returns an unicode
string whose unicode code points are the raw expected bytes (caused by lax behavior of str.decode('iso-8859-1')
) so it silently fails.
Example:
q = QueryDict('foo=%00%7f%80%ff') q['foo'] # actual: # u'\x00\x7f\x80\xff' # expected: # '\x00\x7f\x80\xff'
Relevant code:
https://github.com/django/django/blob/f89b11b879b83aa505dc8231da5f06ca4b1b062e/django/http/request.py#L397-L401
This was introduced by https://github.com/django/django/commit/fa02120d360387bebbbe735e86686bb4c7c43db2 while trying to accept broken user agents that send non-ascii as urlencoded raw values:
- the python 3 version seems fine: it tries to decode before calling the query string parser (from
urllib
) - the python 2 version made a mistake: it decodes after calling the query string parser (and only on the value, not the key strangely)
I'm not sure how to fix this, as there are many locations where the key and value are re-encoded (appendlist
does it too..)
I have not (yet) tested with python3.