Opened 12 years ago

Closed 12 years ago

#19075 closed Bug (fixed)

Query parameters are not decoded properly

Reported by: Ivan Virabyan Owned by: Aymeric Augustin
Component: HTTP handling Version: dev
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

def index(request):
     q = request.GET['q']
     print q, repr(q)

When I make a query like this /?q=%D1%82%D0%B5%D1%81%D1%82
I get: тест u'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
However the right value must be (and it is in Django 1.4): тест u'\u0442\u0435\u0441\u0442'

Attachments (4)

queryparams.diff (563 bytes ) - added by Ivan Virabyan 12 years ago.
prevent convertion of query string to unicode
19075-1.diff (1.6 KB ) - added by Claude Paroz 12 years ago.
Properly decode query string
19075-2.diff (3.1 KB ) - added by Claude Paroz 12 years ago.
Ivan's patch + test
0001-Removed-custom-WSGIRequestHandler.get_environ.patch (4.8 KB ) - added by Claude Paroz 12 years ago.
Removed custom WSGIRequestHandler.get_environ

Download all attachments as: .zip

Change History (21)

by Ivan Virabyan, 12 years ago

Attachment: queryparams.diff added

prevent convertion of query string to unicode

by Claude Paroz, 12 years ago

Attachment: 19075-1.diff added

Properly decode query string

comment:2 by Claude Paroz, 12 years ago

Needs tests: unset

I think I found the culprit code.

comment:3 by Ivan Virabyan, 12 years ago

you have to pass unicode string to QueryDict in tests to emulate real situation.

comment:4 by Claude Paroz, 12 years ago

That's what I am doing. See the unicode_literals import at the start of the test file.

comment:5 by Claude Paroz, 12 years ago

#19081 was a duplicate

comment:6 by Aymeric Augustin, 12 years ago

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

As explained in #19081 the problem is in django.core.servers.basehttp, where some operations on environ['QUERY_STRING'] trigger an implicit conversion to unicode, since I enabled unicode_literals.

Version 0, edited 12 years ago by Aymeric Augustin (next)

comment:7 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin
Patch needs improvement: set

in reply to:  6 comment:8 by Claude Paroz, 12 years ago

Replying to aaugustin:

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

Maybe, maybe not :-)

Here's what's happening with path and querystring in 1.4/Py2:

path querystring
1.4 full path as a bytestring, separated on '?' as two bytestrings in basehttp.py
[basehttp.py] unquote(path) => utf-8 encoded bytestring [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_unicode(path) => unicode path http/init.py parse_qsl() => utf-8 encoded bytestring, then force_unicode() on the result.

Currently in master/Python3, which seems to be correct:

path querystring
master/Py3 full path as a string, separated on '?' as two *strings* in basehttp.py
[basehttp.py] unquote(path) => unicode string [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_text(path), basically a noop http/init.py parse_qsl() => unquote and decode qs to obtain unicode qs

Now the question remains about what to do in master/Py2, where currently both path_info and query_string handling is broken.
One approach would be to let the Python 2 behaviour mostly untouched (the str() or removing unicode_literals approach?), and the other would be to mimic Python 3 (unicode everywhere) and trying to fix the stack to be able to handle those inputs (what I tried, albeit incomplete).

comment:9 by Aymeric Augustin, 12 years ago

I'm very strongly in favor of not touching the historical behavior on Python 2.

I'm sure that some people access environ['QUERY_STRING'] directly, and environ is expected to contain native strings.

by Claude Paroz, 12 years ago

Attachment: 19075-2.diff added

Ivan's patch + test

by Claude Paroz, 12 years ago

Removed custom WSGIRequestHandler.get_environ

comment:10 by Claude Paroz, 12 years ago

Patch needs improvement: unset

I found an even better solution: remove our customized WSGIRequestHandler.get_environ version, which as far as I can see is not differing any longer with the Python upstream version.

comment:11 by Claude Paroz <claude@…>, 12 years ago

Resolution: fixed
Status: newclosed

In 681550ca6d5ff3e591a769643e37fd0b50f29c91:

Removed custom WSGIRequestHandler.get_environ

We probably historically customized it for good reasons, but
currently, the differences with upstream Python are not
significant any longer.
Also fixes #19075 for which a test has been added.

comment:12 by Karen Tracey, 12 years ago

Resolution: fixed
Status: closedreopened

The fix here has re-introduced #2494. Trying to use the dev server on a private network (in my particular case, host machine to client VM) is unbearably slow. Every GET (which includes all static files) is incurring a multi-second delay as the dev server tries a gethostbyaddr() on addrs like 192.168.1.105. A workaround is to put the 192.168.1.105 address in /etc/hosts. Possibly documenting that workaround is all that needs to be done to "fix" this? But I think that would only be OK if this code path is dev-server-only, and that's not something I'm sure about. If we've introduced a reverse DNS lookup on the client IP address into a production code path of every request, that's probably not OK.

(I know this is a fundamentally different problem than the subject of this ticket, but opted to re-open this one since it is the fix for this that has caused the problem, and we need to resolve this new problem without re-introducing the original problem of this ticket, and that seems easiest if it is this ticket that is open rather than an entirely new one.)

comment:13 by Claude Paroz, 12 years ago

AFAIK, WSGIRequestHandler is only used by the dev server, not when deployed in a production environment. But tell me if I'm wrong!

Then what about adding this method in WSGIRequestHandler:

    def address_string(self):
        # Short-circuit parent method to not call socket.getfqdn
        return self.client_address[0]

Karen, could you test this in your environment?

comment:14 by Karen Tracey, 12 years ago

Yeah, I think you are right that this request handler is only used by dev server. In a quick (since I'm actually trying to work on something else!) grep check I saw fastcgi was importing something from basehttp also, but on closer inspection that doesn't seem to involve pulling in WSGIRequestHandler.

At any rate the itty-bitty override of address_string does work to avoid the reverse DNS call, so that seems like a good way to fix it.

comment:15 by Claude Paroz <claude@…>, 12 years ago

In 3e98d98b69e67f2f72055e4b3204d0486eaeff50:

Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

comment:16 by Claude Paroz <claude@…>, 12 years ago

In e51a9c0c948855ed8fe7e1a694e17727af604926:

[1.5.x] Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

Backport of 3e98d98b6 from master.

comment:17 by Claude Paroz, 12 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.
Back to Top