#7379 closed (fixed)
middleware redirects in common.py don't pass the same URL parameters correctly.
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Keywords: | url redirect | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
line 83 of django/middleware/common.py should be:
newurl += '?' + request.META['QUERY_STRING']
instead of:
newurl += '?' + request.GET.urlencode()
That will ensure that the query parameters included in the redirect url are
identical to what was included in the original url.
Change History (8)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
Description: | modified (diff) |
---|
comment:3 by , 16 years ago
milestone: | → 1.0 |
---|---|
Triage Stage: | Unreviewed → Accepted |
In case it isn't clear what the problem is: the bug occurs when an app that uses non-utf8 encoded data (such as a bittorrent tracker, apparently) in a query parameter has one of its urls arrive without a required trailing slash. The middleware that appends the slash is attempting to use request.GET.urlencode() to restore the query string to its original percent-encoded state, but that doesn't work. The request.GET QueryDict was created under an assumption that the data was utf8 encoded, but given that it was not, the resulting unicode data in request.GET is likely full of unicode replacement characters for each invalid set of bytes in the original input. So the redirect that gets sent does have a trailing slash, but does not contain the same query string as the original request.
There's no way to 'undo' the incorrect decoding of what's in request.GET, but the original percent-encoded string is available in request.META['QueryString']
, so that is what should be used here, I believe.
Does it need a test?
comment:4 by , 16 years ago
In case it's useful, there is a way to get the bytes that were part of the request: set request.encoding
to iso-8859-1
or latin1
. It's a Python "feature" that the latin1 codec is actually an identity mapping, even for bytes that aren't part of latin1/ISO-8859-1. So you'll then get back a unicode type object whose contents are the original bytes.
comment:5 by , 16 years ago
Ah, I did not realize that about the latin1 codec in Python. So alternatively the middleware could set request.encoding so that request.GET QueryDict gets (re?)built with no mangling before doing the urlencode(). Seems more straightforward to just go ahead and use request.META['QueryString']
though? Or is there some case where that won't work?
comment:6 by , 16 years ago
The only advantage of using request.GET
that I can see is that it will always be present. In theory, QUERY_STRING
should always be present, too, but I notice that in the different handlers (modpython and wsgi), we construct the GET
dictionary via different methods, which makes me wonder why. However, if the raw input is always available, then using that will avoid any decoding/encoding round trip changes (ordering, normalisation, whatever may happen). So either way is probably sufficient.
comment:7 by , 16 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
The full google group conversation that created this ticket
http://groups.google.com/group/django-users/browse_thread/thread/fc47edb1b9f8ec8f#