Opened 13 years ago

Closed 13 years ago

Last modified 9 years ago

#15718 closed Bug (wontfix)

Django unquotes urls and not able to distinguish %2F and /

Reported by: Fedor Tyurin Owned by: nobody
Component: Core (Other) Version: 1.2
Severity: Normal Keywords: urls, url resolver, unquote, %2F
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've found that in basehttp.py there is a line

env['PATH_INFO'] = urllib.unquote(path)

It replaces all URL-escaped symbols with original symbols. This leads to a situation that you can not properly handle urls with quoted symbols in your urls.py. For example url http://example.com/blah%2Fblah%2Fblah/ will be matched by regexp /(\w+)/(\w+)/(\w+)/$

Under apache with mod_wsgi this seems to lead to even more interesting problem. When %2F is present in URL, request is not handled by django and user gets 404 error directly from apache. Try http://www.djangoproject.com/%2F

Change History (9)

comment:1 by Fedor Tyurin, 13 years ago

After investigation I've found that the 2nd issue (404 error directly from apache) is not related to django and can be avoided by adding "AllowEncodedSlashes On" into apache config. Unfortunately apache replaces %2f with / itself, so the behavior is exactly the same as in simple http server provided by django. In Apache 2.2.18 (which is not released yet, i guess), AllowEncodedSlashes allows value NoDecode. With the value NoDecode, such URLs are accepted, but encoded slashes are not decoded but left in their encoded state. Meanwhile I'm using the workaround

        request_uri = force_unicode(environ.get('REQUEST_URI', u'/'))
        if u'?' in request_uri:
            path_info,query = request_uri.split('?',1)
        else:
            path_info,query = request_uri,''

instead of original

        path_info = force_unicode(environ.get('PATH_INFO', u'/'))

in core/handlers/wsgi.py

comment:2 by Luke Plant, 13 years ago

Type: Bug

comment:3 by Luke Plant, 13 years ago

Severity: Normal

comment:4 by Jacob, 13 years ago

Resolution: wontfix
Status: newclosed

I'm fairly sure that this is present in the dev server specifically because it mimics Apache's behavior -- as you've discovered. Changing this would mean that the dev server would behave differently than production servers.

In fact, poking further, this is more or less enshrined by the WSGI spec -- it's expected that you'll need to re-quote the path if you need to construct the originally given URL.

Further, this would be a devestatingly difficult-to-debug backwards-incompatible change.

Given all that, I'm marking this wontfix: there's simply no real upside to making this change.

comment:5 by anonymous, 13 years ago

Easy pickings: unset

I don't agree that there is no upside. Currently URL http://example.com/A%2fB/C/ will match pattern ^([^/]+)/([^/]+)/([^/]+)/$ instead of expected ^([^/]+)/([^/]+)/$

This restricts usage of URL patterns.

comment:6 by Grégory Starck, 9 years ago

UI/UX: unset

I've ran into the exact same issue :/

The main problem I see is that, as far as I understand actually, django compares the url in its url-decoded form against each possible regex pattern. So the problems we are encountering with '/' encoded url value (%2F). Though I could be wrong 'cause I've not went to check django code.
If I'm not wrong about this:

Wouldn't there be a possibility to tell django to compare some url regex pattern against the original url value in its non-decoded form ??

regards,

gst.

in reply to:  6 comment:7 by Grégory Starck, 9 years ago

Replying to gst:

Wouldn't there be a possibility to tell django to compare some url regex pattern against the original url value in its non-decoded form ??

that would be a feature request, what about if I try to make a patch about it ? would it have chances to be at least reviewed ?

comment:8 by Claude Paroz, 9 years ago

Any patch with tests is worth a review. But of course, we cannot promise it will be accepted.

in reply to:  6 comment:9 by Grégory Starck, 9 years ago

Replying to gst:

I've ran into the exact same issue :/

The main problem I see is that, as far as I understand actually, django compares the url in its url-decoded form against each possible regex pattern. So the problems we are encountering with '/' encoded url value (%2F). Though I could be wrong 'cause I've not went to check django code.

The other possible work around, is to url-encode twice the different parts of the url (so that '/' would be compared as '%2F' when compared to all the url regex patterns and then no more problem also) that you want to reach and then to decode them once in the view.
Though it seems rather special.

Note: See TracTickets for help on using tickets.
Back to Top