Code

#19075 closed Bug (fixed)

Query parameters are not decoded properly

Reported by: ivan_virabyan Owned by: aaugustin
Component: HTTP handling Version: master
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

def index(request):
     q = request.GET['q']
     print q, repr(q)

When I make a query like this /?q=%D1%82%D0%B5%D1%81%D1%82
I get: тест u'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
However the right value must be (and it is in Django 1.4): тест u'\u0442\u0435\u0441\u0442'

Attachments (4)

queryparams.diff (563 bytes) - added by ivan_virabyan 19 months ago.
prevent convertion of query string to unicode
19075-1.diff (1.6 KB) - added by claudep 19 months ago.
Properly decode query string
19075-2.diff (3.1 KB) - added by claudep 18 months ago.
Ivan's patch + test
0001-Removed-custom-WSGIRequestHandler.get_environ.patch (4.8 KB) - added by claudep 18 months ago.
Removed custom WSGIRequestHandler.get_environ

Download all attachments as: .zip

Change History (21)

Changed 19 months ago by ivan_virabyan

prevent convertion of query string to unicode

comment:1 Changed 19 months ago by claudep

  • Needs documentation unset
  • Needs tests set
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Changed 19 months ago by claudep

Properly decode query string

comment:2 Changed 19 months ago by claudep

  • Needs tests unset

I think I found the culprit code.

comment:3 Changed 19 months ago by ivan_virabyan

you have to pass unicode string to QueryDict in tests to emulate real situation.

comment:4 Changed 19 months ago by claudep

That's what I am doing. See the unicode_literals import at the start of the test file.

comment:5 Changed 19 months ago by claudep

#19081 was a duplicate

comment:6 follow-up: Changed 19 months ago by aaugustin

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

As explained in #19081 the problem is in django.core.servers.basehttp, where some operations on environ['QUERY_STRING'] trigger an implicit conversion to unicode, since I enabled unicode_literals.

Version 0, edited 19 months ago by aaugustin (next)

comment:7 Changed 19 months ago by aaugustin

  • Owner changed from nobody to aaugustin
  • Patch needs improvement set

comment:8 in reply to: ↑ 6 Changed 18 months ago by claudep

Replying to aaugustin:

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

Maybe, maybe not :-)

Here's what's happening with path and querystring in 1.4/Py2:

path querystring
1.4 full path as a bytestring, separated on '?' as two bytestrings in basehttp.py
[basehttp.py] unquote(path) => utf-8 encoded bytestring [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_unicode(path) => unicode path http/init.py parse_qsl() => utf-8 encoded bytestring, then force_unicode() on the result.

Currently in master/Python3, which seems to be correct:

path querystring
master/Py3 full path as a string, separated on '?' as two *strings* in basehttp.py
[basehttp.py] unquote(path) => unicode string [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_text(path), basically a noop http/init.py parse_qsl() => unquote and decode qs to obtain unicode qs

Now the question remains about what to do in master/Py2, where currently both path_info and query_string handling is broken.
One approach would be to let the Python 2 behaviour mostly untouched (the str() or removing unicode_literals approach?), and the other would be to mimic Python 3 (unicode everywhere) and trying to fix the stack to be able to handle those inputs (what I tried, albeit incomplete).

comment:9 Changed 18 months ago by aaugustin

I'm very strongly in favor of not touching the historical behavior on Python 2.

I'm sure that some people access environ['QUERY_STRING'] directly, and environ is expected to contain native strings.

Changed 18 months ago by claudep

Ivan's patch + test

Changed 18 months ago by claudep

Removed custom WSGIRequestHandler.get_environ

comment:10 Changed 18 months ago by claudep

  • Patch needs improvement unset

I found an even better solution: remove our customized WSGIRequestHandler.get_environ version, which as far as I can see is not differing any longer with the Python upstream version.

comment:11 Changed 18 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 681550ca6d5ff3e591a769643e37fd0b50f29c91:

Removed custom WSGIRequestHandler.get_environ

We probably historically customized it for good reasons, but
currently, the differences with upstream Python are not
significant any longer.
Also fixes #19075 for which a test has been added.

comment:12 Changed 18 months ago by kmtracey

  • Resolution fixed deleted
  • Status changed from closed to reopened

The fix here has re-introduced #2494. Trying to use the dev server on a private network (in my particular case, host machine to client VM) is unbearably slow. Every GET (which includes all static files) is incurring a multi-second delay as the dev server tries a gethostbyaddr() on addrs like 192.168.1.105. A workaround is to put the 192.168.1.105 address in /etc/hosts. Possibly documenting that workaround is all that needs to be done to "fix" this? But I think that would only be OK if this code path is dev-server-only, and that's not something I'm sure about. If we've introduced a reverse DNS lookup on the client IP address into a production code path of every request, that's probably not OK.

(I know this is a fundamentally different problem than the subject of this ticket, but opted to re-open this one since it is the fix for this that has caused the problem, and we need to resolve this new problem without re-introducing the original problem of this ticket, and that seems easiest if it is this ticket that is open rather than an entirely new one.)

comment:13 Changed 18 months ago by claudep

AFAIK, WSGIRequestHandler is only used by the dev server, not when deployed in a production environment. But tell me if I'm wrong!

Then what about adding this method in WSGIRequestHandler:

    def address_string(self):
        # Short-circuit parent method to not call socket.getfqdn
        return self.client_address[0]

Karen, could you test this in your environment?

comment:14 Changed 18 months ago by kmtracey

Yeah, I think you are right that this request handler is only used by dev server. In a quick (since I'm actually trying to work on something else!) grep check I saw fastcgi was importing something from basehttp also, but on closer inspection that doesn't seem to involve pulling in WSGIRequestHandler.

At any rate the itty-bitty override of address_string does work to avoid the reverse DNS call, so that seems like a good way to fix it.

comment:15 Changed 18 months ago by Claude Paroz <claude@…>

In 3e98d98b69e67f2f72055e4b3204d0486eaeff50:

Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

comment:16 Changed 18 months ago by Claude Paroz <claude@…>

In e51a9c0c948855ed8fe7e1a694e17727af604926:

[1.5.x] Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

Backport of 3e98d98b6 from master.

comment:17 Changed 18 months ago by claudep

  • Resolution set to fixed
  • Status changed from reopened to closed

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.