Opened 16 years ago

Closed 16 years ago

Last modified 12 years ago

#7379 closed (fixed)

middleware redirects in common.py don't pass the same URL parameters correctly.

Reported by: John Matthew <john@…> Owned by: nobody
Component: HTTP handling Version: dev
Severity: Keywords: url redirect
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ramiro Morales)

line 83 of django/middleware/common.py should be:

    newurl += '?' + request.META['QUERY_STRING']

instead of:

    newurl += '?' + request.GET.urlencode()

That will ensure that the query parameters included in the redirect url are
identical to what was included in the original url.

Change History (8)

comment:1 by John Matthew <john@…>, 16 years ago

The full google group conversation that created this ticket

http://groups.google.com/group/django-users/browse_thread/thread/fc47edb1b9f8ec8f#

comment:2 by Ramiro Morales, 16 years ago

Description: modified (diff)

comment:3 by Karen Tracey <kmtracey@…>, 16 years ago

milestone: 1.0
Triage Stage: UnreviewedAccepted

In case it isn't clear what the problem is: the bug occurs when an app that uses non-utf8 encoded data (such as a bittorrent tracker, apparently) in a query parameter has one of its urls arrive without a required trailing slash. The middleware that appends the slash is attempting to use request.GET.urlencode() to restore the query string to its original percent-encoded state, but that doesn't work. The request.GET QueryDict was created under an assumption that the data was utf8 encoded, but given that it was not, the resulting unicode data in request.GET is likely full of unicode replacement characters for each invalid set of bytes in the original input. So the redirect that gets sent does have a trailing slash, but does not contain the same query string as the original request.

There's no way to 'undo' the incorrect decoding of what's in request.GET, but the original percent-encoded string is available in request.META['QueryString'], so that is what should be used here, I believe.

Does it need a test?

comment:4 by Malcolm Tredinnick, 16 years ago

In case it's useful, there is a way to get the bytes that were part of the request: set request.encoding to iso-8859-1 or latin1. It's a Python "feature" that the latin1 codec is actually an identity mapping, even for bytes that aren't part of latin1/ISO-8859-1. So you'll then get back a unicode type object whose contents are the original bytes.

comment:5 by Karen Tracey <kmtracey@…>, 16 years ago

Ah, I did not realize that about the latin1 codec in Python. So alternatively the middleware could set request.encoding so that request.GET QueryDict gets (re?)built with no mangling before doing the urlencode(). Seems more straightforward to just go ahead and use request.META['QueryString'] though? Or is there some case where that won't work?

comment:6 by Malcolm Tredinnick, 16 years ago

The only advantage of using request.GET that I can see is that it will always be present. In theory, QUERY_STRING should always be present, too, but I notice that in the different handlers (modpython and wsgi), we construct the GET dictionary via different methods, which makes me wonder why. However, if the raw input is always available, then using that will avoid any decoding/encoding round trip changes (ordering, normalisation, whatever may happen). So either way is probably sufficient.

comment:7 by Jacob, 16 years ago

Resolution: fixed
Status: newclosed

(In [8635]) Fixed #7379: fixed a subtle corner case involving URL encoding in CommonMiddleware

comment:8 by Jacob, 12 years ago

milestone: 1.0

Milestone 1.0 deleted

Note: See TracTickets for help on using tickets.
Back to Top