Opened 12 years ago

Closed 12 years ago

#19075 closed Bug (fixed)

Query parameters are not decoded properly

Reported by: Ivan Virabyan Owned by: Aymeric Augustin
Component: HTTP handling Version: dev
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

def index(request):
     q = request.GET['q']
     print q, repr(q)

When I make a query like this /?q=%D1%82%D0%B5%D1%81%D1%82
I get: тест u'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
However the right value must be (and it is in Django 1.4): тест u'\u0442\u0435\u0441\u0442'

Attachments (4)

queryparams.diff (563 bytes ) - added by Ivan Virabyan 12 years ago.
prevent convertion of query string to unicode
19075-1.diff (1.6 KB ) - added by Claude Paroz 12 years ago.
Properly decode query string
19075-2.diff (3.1 KB ) - added by Claude Paroz 12 years ago.
Ivan's patch + test
0001-Removed-custom-WSGIRequestHandler.get_environ.patch (4.8 KB ) - added by Claude Paroz 12 years ago.
Removed custom WSGIRequestHandler.get_environ

Download all attachments as: .zip

Change History (21)

by Ivan Virabyan, 12 years ago

Attachment: queryparams.diff added

prevent convertion of query string to unicode

by Claude Paroz, 12 years ago

Attachment: 19075-1.diff added

Properly decode query string

comment:2 by Claude Paroz, 12 years ago

Needs tests: unset

I think I found the culprit code.

comment:3 by Ivan Virabyan, 12 years ago

you have to pass unicode string to QueryDict in tests to emulate real situation.

comment:4 by Claude Paroz, 12 years ago

That's what I am doing. See the unicode_literals import at the start of the test file.

comment:5 by Claude Paroz, 12 years ago

#19081 was a duplicate

comment:6 by Aymeric Augustin, 12 years ago

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

As explained in #19081 the problem is in django.core.servers.basehttp, where some operations on environ['QUERY_STRING'] trigger an implicit conversion to unicode, since I enabled unicode_literals.

The patch provided by the reporter is closer to the correct solution. However there are many more strings is this module that became unicode with unicode_literals, with unknown consequences.

We have several options available now, by order of increasing effort:

  • roll back unicode_literals, and fix whatever needs fixing for Python 3,
  • liberally sprinkle str('..') in the module, and hope we didn't miss one,
  • as suggested in #19081, compare the current code with the 2.6 and 2.7 stdlib, and switch to the stdlib wherever possible, subclassing if necessary. This should make the issue go away.
Last edited 12 years ago by Aymeric Augustin (previous) (diff)

comment:7 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin
Patch needs improvement: set

in reply to:  6 comment:8 by Claude Paroz, 12 years ago

Replying to aaugustin:

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

Maybe, maybe not :-)

Here's what's happening with path and querystring in 1.4/Py2:

path querystring
1.4 full path as a bytestring, separated on '?' as two bytestrings in basehttp.py
[basehttp.py] unquote(path) => utf-8 encoded bytestring [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_unicode(path) => unicode path http/init.py parse_qsl() => utf-8 encoded bytestring, then force_unicode() on the result.

Currently in master/Python3, which seems to be correct:

path querystring
master/Py3 full path as a string, separated on '?' as two *strings* in basehttp.py
[basehttp.py] unquote(path) => unicode string [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_text(path), basically a noop http/init.py parse_qsl() => unquote and decode qs to obtain unicode qs

Now the question remains about what to do in master/Py2, where currently both path_info and query_string handling is broken.
One approach would be to let the Python 2 behaviour mostly untouched (the str() or removing unicode_literals approach?), and the other would be to mimic Python 3 (unicode everywhere) and trying to fix the stack to be able to handle those inputs (what I tried, albeit incomplete).

comment:9 by Aymeric Augustin, 12 years ago

I'm very strongly in favor of not touching the historical behavior on Python 2.

I'm sure that some people access environ['QUERY_STRING'] directly, and environ is expected to contain native strings.

by Claude Paroz, 12 years ago

Attachment: 19075-2.diff added

Ivan's patch + test

by Claude Paroz, 12 years ago

Removed custom WSGIRequestHandler.get_environ

comment:10 by Claude Paroz, 12 years ago

Patch needs improvement: unset

I found an even better solution: remove our customized WSGIRequestHandler.get_environ version, which as far as I can see is not differing any longer with the Python upstream version.

comment:11 by Claude Paroz <claude@…>, 12 years ago

Resolution: fixed
Status: newclosed

In 681550ca6d5ff3e591a769643e37fd0b50f29c91:

Removed custom WSGIRequestHandler.get_environ

We probably historically customized it for good reasons, but
currently, the differences with upstream Python are not
significant any longer.
Also fixes #19075 for which a test has been added.

comment:12 by Karen Tracey, 12 years ago

Resolution: fixed
Status: closedreopened

The fix here has re-introduced #2494. Trying to use the dev server on a private network (in my particular case, host machine to client VM) is unbearably slow. Every GET (which includes all static files) is incurring a multi-second delay as the dev server tries a gethostbyaddr() on addrs like 192.168.1.105. A workaround is to put the 192.168.1.105 address in /etc/hosts. Possibly documenting that workaround is all that needs to be done to "fix" this? But I think that would only be OK if this code path is dev-server-only, and that's not something I'm sure about. If we've introduced a reverse DNS lookup on the client IP address into a production code path of every request, that's probably not OK.

(I know this is a fundamentally different problem than the subject of this ticket, but opted to re-open this one since it is the fix for this that has caused the problem, and we need to resolve this new problem without re-introducing the original problem of this ticket, and that seems easiest if it is this ticket that is open rather than an entirely new one.)

comment:13 by Claude Paroz, 12 years ago

AFAIK, WSGIRequestHandler is only used by the dev server, not when deployed in a production environment. But tell me if I'm wrong!

Then what about adding this method in WSGIRequestHandler:

    def address_string(self):
        # Short-circuit parent method to not call socket.getfqdn
        return self.client_address[0]

Karen, could you test this in your environment?

comment:14 by Karen Tracey, 12 years ago

Yeah, I think you are right that this request handler is only used by dev server. In a quick (since I'm actually trying to work on something else!) grep check I saw fastcgi was importing something from basehttp also, but on closer inspection that doesn't seem to involve pulling in WSGIRequestHandler.

At any rate the itty-bitty override of address_string does work to avoid the reverse DNS call, so that seems like a good way to fix it.

comment:15 by Claude Paroz <claude@…>, 12 years ago

In 3e98d98b69e67f2f72055e4b3204d0486eaeff50:

Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

comment:16 by Claude Paroz <claude@…>, 12 years ago

In e51a9c0c948855ed8fe7e1a694e17727af604926:

[1.5.x] Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

Backport of 3e98d98b6 from master.

comment:17 by Claude Paroz, 12 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.
Back to Top