Code

#19075 closed Bug (fixed)

Query parameters are not decoded properly

Reported by: ivan_virabyan Owned by: aaugustin
Component: HTTP handling Version: master
Severity: Release blocker Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

def index(request):
     q = request.GET['q']
     print q, repr(q)

When I make a query like this /?q=%D1%82%D0%B5%D1%81%D1%82
I get: тест u'\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
However the right value must be (and it is in Django 1.4): тест u'\u0442\u0435\u0441\u0442'

Attachments (4)

queryparams.diff (563 bytes) - added by ivan_virabyan 21 months ago.
prevent convertion of query string to unicode
19075-1.diff (1.6 KB) - added by claudep 21 months ago.
Properly decode query string
19075-2.diff (3.1 KB) - added by claudep 21 months ago.
Ivan's patch + test
0001-Removed-custom-WSGIRequestHandler.get_environ.patch (4.8 KB) - added by claudep 21 months ago.
Removed custom WSGIRequestHandler.get_environ

Download all attachments as: .zip

Change History (21)

Changed 21 months ago by ivan_virabyan

prevent convertion of query string to unicode

comment:1 Changed 21 months ago by claudep

  • Needs documentation unset
  • Needs tests set
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

Changed 21 months ago by claudep

Properly decode query string

comment:2 Changed 21 months ago by claudep

  • Needs tests unset

I think I found the culprit code.

comment:3 Changed 21 months ago by ivan_virabyan

you have to pass unicode string to QueryDict in tests to emulate real situation.

comment:4 Changed 21 months ago by claudep

That's what I am doing. See the unicode_literals import at the start of the test file.

comment:5 Changed 21 months ago by claudep

#19081 was a duplicate

comment:6 follow-up: Changed 21 months ago by aaugustin

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

As explained in #19081 the problem is in django.core.servers.basehttp, where some operations on environ['QUERY_STRING'] trigger an implicit conversion to unicode, since I enabled unicode_literals.

The patch provided by the reporter is closer to the correct solution. However there are many more strings is this module that became unicode with unicode_literals, with unknown consequences.

We have several options available now, by order of increasing effort:

  • roll back unicode_literals, and fix whatever needs fixing for Python 3,
  • liberally sprinkle str('..') in the module, and hope we didn't miss one,
  • as suggested in #19081, compare the current code with the 2.6 and 2.7 stdlib, and switch to the stdlib wherever possible, subclassing if necessary. This should make the issue go away.
Last edited 21 months ago by aaugustin (previous) (diff)

comment:7 Changed 21 months ago by aaugustin

  • Owner changed from nobody to aaugustin
  • Patch needs improvement set

comment:8 in reply to: ↑ 6 Changed 21 months ago by claudep

Replying to aaugustin:

This patch is fixing the problem at the wrong level — it's addressing one symptom rather than the root cause.

Maybe, maybe not :-)

Here's what's happening with path and querystring in 1.4/Py2:

path querystring
1.4 full path as a bytestring, separated on '?' as two bytestrings in basehttp.py
[basehttp.py] unquote(path) => utf-8 encoded bytestring [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_unicode(path) => unicode path http/init.py parse_qsl() => utf-8 encoded bytestring, then force_unicode() on the result.

Currently in master/Python3, which seems to be correct:

path querystring
master/Py3 full path as a string, separated on '?' as two *strings* in basehttp.py
[basehttp.py] unquote(path) => unicode string [basehttp.py] untouched
[wsgi.py:WSGIRequest] force_text(path), basically a noop http/init.py parse_qsl() => unquote and decode qs to obtain unicode qs

Now the question remains about what to do in master/Py2, where currently both path_info and query_string handling is broken.
One approach would be to let the Python 2 behaviour mostly untouched (the str() or removing unicode_literals approach?), and the other would be to mimic Python 3 (unicode everywhere) and trying to fix the stack to be able to handle those inputs (what I tried, albeit incomplete).

comment:9 Changed 21 months ago by aaugustin

I'm very strongly in favor of not touching the historical behavior on Python 2.

I'm sure that some people access environ['QUERY_STRING'] directly, and environ is expected to contain native strings.

Changed 21 months ago by claudep

Ivan's patch + test

Changed 21 months ago by claudep

Removed custom WSGIRequestHandler.get_environ

comment:10 Changed 21 months ago by claudep

  • Patch needs improvement unset

I found an even better solution: remove our customized WSGIRequestHandler.get_environ version, which as far as I can see is not differing any longer with the Python upstream version.

comment:11 Changed 21 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 681550ca6d5ff3e591a769643e37fd0b50f29c91:

Removed custom WSGIRequestHandler.get_environ

We probably historically customized it for good reasons, but
currently, the differences with upstream Python are not
significant any longer.
Also fixes #19075 for which a test has been added.

comment:12 Changed 21 months ago by kmtracey

  • Resolution fixed deleted
  • Status changed from closed to reopened

The fix here has re-introduced #2494. Trying to use the dev server on a private network (in my particular case, host machine to client VM) is unbearably slow. Every GET (which includes all static files) is incurring a multi-second delay as the dev server tries a gethostbyaddr() on addrs like 192.168.1.105. A workaround is to put the 192.168.1.105 address in /etc/hosts. Possibly documenting that workaround is all that needs to be done to "fix" this? But I think that would only be OK if this code path is dev-server-only, and that's not something I'm sure about. If we've introduced a reverse DNS lookup on the client IP address into a production code path of every request, that's probably not OK.

(I know this is a fundamentally different problem than the subject of this ticket, but opted to re-open this one since it is the fix for this that has caused the problem, and we need to resolve this new problem without re-introducing the original problem of this ticket, and that seems easiest if it is this ticket that is open rather than an entirely new one.)

comment:13 Changed 21 months ago by claudep

AFAIK, WSGIRequestHandler is only used by the dev server, not when deployed in a production environment. But tell me if I'm wrong!

Then what about adding this method in WSGIRequestHandler:

    def address_string(self):
        # Short-circuit parent method to not call socket.getfqdn
        return self.client_address[0]

Karen, could you test this in your environment?

comment:14 Changed 20 months ago by kmtracey

Yeah, I think you are right that this request handler is only used by dev server. In a quick (since I'm actually trying to work on something else!) grep check I saw fastcgi was importing something from basehttp also, but on closer inspection that doesn't seem to involve pulling in WSGIRequestHandler.

At any rate the itty-bitty override of address_string does work to avoid the reverse DNS call, so that seems like a good way to fix it.

comment:15 Changed 20 months ago by Claude Paroz <claude@…>

In 3e98d98b69e67f2f72055e4b3204d0486eaeff50:

Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

comment:16 Changed 20 months ago by Claude Paroz <claude@…>

In e51a9c0c948855ed8fe7e1a694e17727af604926:

[1.5.x] Prevented host resolution when running dev server

Refs #19075, #2494.
Thanks Karen Tracey for spotting the issue.

Backport of 3e98d98b6 from master.

comment:17 Changed 20 months ago by claudep

  • Resolution set to fixed
  • Status changed from reopened to closed

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.