Django

Code

Ticket #7379 (closed: fixed)

Opened 6 months ago

Last modified 3 months ago

middleware redirects in common.py don't pass the same URL parameters correctly.

Reported by: John Matthew <john@compunique.com> Assigned to: nobody
Milestone: 1.0 Component: HTTP handling
Version: SVN Keywords: url redirect
Cc: Triage Stage: Accepted
Has patch: 0 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description (Last modified by ramiro)

line 83 of django/middleware/common.py should be:

    newurl += '?' + request.META['QUERY_STRING']

instead of:

    newurl += '?' + request.GET.urlencode()

That will ensure that the query parameters included in the redirect url are identical to what was included in the original url.

Attachments

Change History

06/05/08 17:22:44 changed by John Matthew <john@compunique.com>

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

The full google group conversation that created this ticket

http://groups.google.com/group/django-users/browse_thread/thread/fc47edb1b9f8ec8f#

06/16/08 10:27:28 changed by ramiro

  • description changed.

08/07/08 20:17:02 changed by Karen Tracey <kmtracey@gmail.com>

  • stage changed from Unreviewed to Accepted.
  • milestone set to 1.0.

In case it isn't clear what the problem is: the bug occurs when an app that uses non-utf8 encoded data (such as a bittorrent tracker, apparently) in a query parameter has one of its urls arrive without a required trailing slash. The middleware that appends the slash is attempting to use request.GET.urlencode() to restore the query string to its original percent-encoded state, but that doesn't work. The request.GET QueryDict was created under an assumption that the data was utf8 encoded, but given that it was not, the resulting unicode data in request.GET is likely full of unicode replacement characters for each invalid set of bytes in the original input. So the redirect that gets sent does have a trailing slash, but does not contain the same query string as the original request.

There's no way to 'undo' the incorrect decoding of what's in request.GET, but the original percent-encoded string is available in request.META['QueryString'], so that is what should be used here, I believe.

Does it need a test?

08/07/08 20:32:29 changed by mtredinnick

In case it's useful, there is a way to get the bytes that were part of the request: set request.encoding to iso-8859-1 or latin1. It's a Python "feature" that the latin1 codec is actually an identity mapping, even for bytes that aren't part of latin1/ISO-8859-1. So you'll then get back a unicode type object whose contents are the original bytes.

08/07/08 20:44:19 changed by Karen Tracey <kmtracey@gmail.com>

Ah, I did not realize that about the latin1 codec in Python. So alternatively the middleware could set request.encoding so that request.GET QueryDict gets (re?)built with no mangling before doing the urlencode(). Seems more straightforward to just go ahead and use request.META['QueryString'] though? Or is there some case where that won't work?

08/07/08 20:51:35 changed by mtredinnick

The only advantage of using request.GET that I can see is that it will always be present. In theory, QUERY_STRING should always be present, too, but I notice that in the different handlers (modpython and wsgi), we construct the GET dictionary via different methods, which makes me wonder why. However, if the raw input is always available, then using that will avoid any decoding/encoding round trip changes (ordering, normalisation, whatever may happen). So either way is probably sufficient.

08/27/08 15:30:01 changed by jacob

  • status changed from new to closed.
  • resolution set to fixed.

(In [8635]) Fixed #7379: fixed a subtle corner case involving URL encoding in CommonMiddleware


Add/Change #7379 (middleware redirects in common.py don't pass the same URL parameters correctly.)




Change Properties
Action