Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#18499 closed Bug (invalid)

Malformed query string becomes part of request.path

Reported by: mkai Owned by: nobody
Component: Core (URLs) Version: 1.4
Severity: Normal Keywords: query string, query_string, malformed
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've been seeing some requests from crawlers to URLs like this lately:

http://site.com/page/&a=y&b=y

DJango parses this into

request.path == "/page/&a=y&b=y"
request.META["QUERY_STRING"] == "".

What I'd like to see is for e. g. CommonMiddleware to fix this malformed query string by replacing the first '&' with an '?'; so that

request.path == '/page/' 
request.META["QUERY_STRING"] == "a=y&b=y".

This caused some 404s for my site, so I made the following piece of middleware to work around it:

class MalformedQueryStringMiddleware(object):
    def process_request(self, request):
        if '&' in request.path:  # '&' left over from a malformed query string
            url = '%s&%s' % (request.path.replace('&', '?', 1),
                             request.META['QUERY_STRING'])
            return HttpResponsePermanentRedirect(url)

Do you think Django should do that by itself?

Change History (3)

comment:1 by Aymeric Augustin, 12 years ago

Resolution: invalid
Status: newclosed

Django shouldn't perform a "magic fix" that isn't mandated by the standards, because others may very well consider it a "magic break".

Using your own middleware is a perfectly reasonable approach here.

You may want to check where the broken URL appears, too (with Google's link: query).

comment:2 by anonymous, 12 years ago

Shouldn't your middleware's if line ignore path that actually do have "?"

    if '&' in request.path and '?' not in request.path:

comment:3 by mkai, 12 years ago

I understand, I didn't know that it wasn't spec'd in HTTP.

As for the anonymous commenter, I don't think checking for '?' is necessary because that would be the start of a regular query string which would have already been parsed and stripped from the request path by Django. Or am I missing something?

Here's the current version of the code that doesn't assume a query string to exist.

class MalformedQueryStringMiddleware(object):
    """
    Detects a malformed query string and redirects to a sanitized version of 
    the requested URL.

    Put it before CommonMiddleware in MIDDLEWARE_CLASSES.

    Example: redirect http://site.com/&a=x&b=y to http://site.com/?a=x&b=y

    """
    def process_request(self, request):
        if '&' in request.path:  # '&' left over from a malformed query string
            url = request.path.replace('&', '?', 1)
            if request.META.get('QUERY_STRING', ''):
                # append regular query string
                url += '&' + request.META['QUERY_STRING']
            return HttpResponsePermanentRedirect(url)
Note: See TracTickets for help on using tickets.
Back to Top