Opened 12 years ago
Closed 12 years ago
#19075 closed Bug (fixed)
Query parameters are not decoded properly
Reported by: | Ivan Virabyan | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Release blocker | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
def index(request): q = request.GET['q'] print q, repr(q)
When I make a query like this /?q=%D1%82%D0%B5%D1%81%D1%82
I get: ÑеÑÑ u'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
However the right value must be (and it is in Django 1.4): тест u'\u0442\u0435\u0441\u0442'
Attachments (4)
Change History (21)
by , 12 years ago
Attachment: | queryparams.diff added |
---|
comment:1 by , 12 years ago
Needs tests: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 12 years ago
you have to pass unicode string to QueryDict in tests to emulate real situation.
comment:4 by , 12 years ago
That's what I am doing. See the unicode_literals import at the start of the test file.
follow-up: 8 comment:6 by , 12 years ago
This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.
As explained in #19081 the problem is in django.core.servers.basehttp
, where some operations on environ['QUERY_STRING']
trigger an implicit conversion to unicode, since I enabled unicode_literals
.
The patch provided by the reporter is closer to the correct solution. However there are many more strings is this module that became unicode with unicode_literals
, with unknown consequences.
We have several options available now, by order of increasing effort:
- roll back
unicode_literals
, and fix whatever needs fixing for Python 3, - liberally sprinkle
str('..')
in the module, and hope we didn't miss one, - as suggested in #19081, compare the current code with the 2.6 and 2.7 stdlib, and switch to the stdlib wherever possible, subclassing if necessary. This should make the issue go away.
comment:7 by , 12 years ago
Owner: | changed from | to
---|---|
Patch needs improvement: | set |
comment:8 by , 12 years ago
Replying to aaugustin:
This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.
Maybe, maybe not :-)
Here's what's happening with path and querystring in 1.4/Py2:
path | querystring | |
---|---|---|
1.4 | full path as a bytestring, separated on '?' as two bytestrings in basehttp.py | |
[basehttp.py] unquote(path) => utf-8 encoded bytestring | [basehttp.py] untouched | |
[wsgi.py:WSGIRequest] force_unicode(path) => unicode path | http/init.py parse_qsl() => utf-8 encoded bytestring, then force_unicode() on the result. |
Currently in master/Python3, which seems to be correct:
path | querystring | |
---|---|---|
master/Py3 | full path as a string, separated on '?' as two *strings* in basehttp.py | |
[basehttp.py] unquote(path) => unicode string | [basehttp.py] untouched | |
[wsgi.py:WSGIRequest] force_text(path), basically a noop | http/init.py parse_qsl() => unquote and decode qs to obtain unicode qs |
Now the question remains about what to do in master/Py2, where currently both path_info and query_string handling is broken.
One approach would be to let the Python 2 behaviour mostly untouched (the str() or removing unicode_literals approach?), and the other would be to mimic Python 3 (unicode everywhere) and trying to fix the stack to be able to handle those inputs (what I tried, albeit incomplete).
comment:9 by , 12 years ago
I'm very strongly in favor of not touching the historical behavior on Python 2.
I'm sure that some people access environ['QUERY_STRING']
directly, and environ is expected to contain native strings.
by , 12 years ago
Attachment: | 0001-Removed-custom-WSGIRequestHandler.get_environ.patch added |
---|
Removed custom WSGIRequestHandler.get_environ
comment:10 by , 12 years ago
Patch needs improvement: | unset |
---|
I found an even better solution: remove our customized WSGIRequestHandler.get_environ version, which as far as I can see is not differing any longer with the Python upstream version.
comment:11 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:12 by , 12 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
The fix here has re-introduced #2494. Trying to use the dev server on a private network (in my particular case, host machine to client VM) is unbearably slow. Every GET (which includes all static files) is incurring a multi-second delay as the dev server tries a gethostbyaddr() on addrs like 192.168.1.105. A workaround is to put the 192.168.1.105 address in /etc/hosts. Possibly documenting that workaround is all that needs to be done to "fix" this? But I think that would only be OK if this code path is dev-server-only, and that's not something I'm sure about. If we've introduced a reverse DNS lookup on the client IP address into a production code path of every request, that's probably not OK.
(I know this is a fundamentally different problem than the subject of this ticket, but opted to re-open this one since it is the fix for this that has caused the problem, and we need to resolve this new problem without re-introducing the original problem of this ticket, and that seems easiest if it is this ticket that is open rather than an entirely new one.)
comment:13 by , 12 years ago
AFAIK, WSGIRequestHandler
is only used by the dev server, not when deployed in a production environment. But tell me if I'm wrong!
Then what about adding this method in WSGIRequestHandler
:
def address_string(self): # Short-circuit parent method to not call socket.getfqdn return self.client_address[0]
Karen, could you test this in your environment?
comment:14 by , 12 years ago
Yeah, I think you are right that this request handler is only used by dev server. In a quick (since I'm actually trying to work on something else!) grep check I saw fastcgi was importing something from basehttp also, but on closer inspection that doesn't seem to involve pulling in WSGIRequestHandler.
At any rate the itty-bitty override of address_string
does work to avoid the reverse DNS call, so that seems like a good way to fix it.
comment:17 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
prevent convertion of query string to unicode