#18499 closed Bug (invalid)
Malformed query string becomes part of request.path
Reported by: | mkai | Owned by: | nobody |
---|---|---|---|
Component: | Core (URLs) | Version: | 1.4 |
Severity: | Normal | Keywords: | query string, query_string, malformed |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I've been seeing some requests from crawlers to URLs like this lately:
http://site.com/page/&a=y&b=y
DJango parses this into
request.path == "/page/&a=y&b=y" request.META["QUERY_STRING"] == "".
What I'd like to see is for e. g. CommonMiddleware to fix this malformed query string by replacing the first '&' with an '?'; so that
request.path == '/page/' request.META["QUERY_STRING"] == "a=y&b=y".
This caused some 404s for my site, so I made the following piece of middleware to work around it:
class MalformedQueryStringMiddleware(object): def process_request(self, request): if '&' in request.path: # '&' left over from a malformed query string url = '%s&%s' % (request.path.replace('&', '?', 1), request.META['QUERY_STRING']) return HttpResponsePermanentRedirect(url)
Do you think Django should do that by itself?
Change History (3)
comment:1 by , 12 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 12 years ago
Shouldn't your middleware's if line ignore path that actually do have "?"
if '&' in request.path and '?' not in request.path:
comment:3 by , 12 years ago
I understand, I didn't know that it wasn't spec'd in HTTP.
As for the anonymous commenter, I don't think checking for '?' is necessary because that would be the start of a regular query string which would have already been parsed and stripped from the request path by Django. Or am I missing something?
Here's the current version of the code that doesn't assume a query string to exist.
class MalformedQueryStringMiddleware(object): """ Detects a malformed query string and redirects to a sanitized version of the requested URL. Put it before CommonMiddleware in MIDDLEWARE_CLASSES. Example: redirect http://site.com/&a=x&b=y to http://site.com/?a=x&b=y """ def process_request(self, request): if '&' in request.path: # '&' left over from a malformed query string url = request.path.replace('&', '?', 1) if request.META.get('QUERY_STRING', ''): # append regular query string url += '&' + request.META['QUERY_STRING'] return HttpResponsePermanentRedirect(url)
Django shouldn't perform a "magic fix" that isn't mandated by the standards, because others may very well consider it a "magic break".
Using your own middleware is a perfectly reasonable approach here.
You may want to check where the broken URL appears, too (with Google's link: query).