#15718 closed Bug (wontfix)
Django unquotes urls and not able to distinguish %2F and /
Reported by: | Fedor Tyurin | Owned by: | nobody |
---|---|---|---|
Component: | Core (Other) | Version: | 1.2 |
Severity: | Normal | Keywords: | urls, url resolver, unquote, %2F |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I've found that in basehttp.py there is a line
env['PATH_INFO'] = urllib.unquote(path)
It replaces all URL-escaped symbols with original symbols. This leads to a situation that you can not properly handle urls with quoted symbols in your urls.py. For example url http://example.com/blah%2Fblah%2Fblah/ will be matched by regexp /(\w+)/(\w+)/(\w+)/$
Under apache with mod_wsgi this seems to lead to even more interesting problem. When %2F is present in URL, request is not handled by django and user gets 404 error directly from apache. Try http://www.djangoproject.com/%2F
Change History (9)
comment:1 by , 14 years ago
comment:2 by , 14 years ago
Type: | → Bug |
---|
comment:3 by , 14 years ago
Severity: | → Normal |
---|
comment:4 by , 14 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I'm fairly sure that this is present in the dev server specifically because it mimics Apache's behavior -- as you've discovered. Changing this would mean that the dev server would behave differently than production servers.
In fact, poking further, this is more or less enshrined by the WSGI spec -- it's expected that you'll need to re-quote the path if you need to construct the originally given URL.
Further, this would be a devestatingly difficult-to-debug backwards-incompatible change.
Given all that, I'm marking this wontfix: there's simply no real upside to making this change.
comment:5 by , 14 years ago
Easy pickings: | unset |
---|
I don't agree that there is no upside. Currently URL http://example.com/A%2fB/C/ will match pattern ^([^/]+)/([^/]+)/([^/]+)/$
instead of expected ^([^/]+)/([^/]+)/$
This restricts usage of URL patterns.
follow-ups: 7 9 comment:6 by , 10 years ago
UI/UX: | unset |
---|
I've ran into the exact same issue :/
The main problem I see is that, as far as I understand actually, django compares the url in its url-decoded form against each possible regex pattern. So the problems we are encountering with '/' encoded url value (%2F). Though I could be wrong 'cause I've not went to check django code.
If I'm not wrong about this:
Wouldn't there be a possibility to tell django to compare some url regex pattern against the original url value in its non-decoded form ??
regards,
gst.
comment:7 by , 10 years ago
Replying to gst:
Wouldn't there be a possibility to tell django to compare some url regex pattern against the original url value in its non-decoded form ??
that would be a feature request, what about if I try to make a patch about it ? would it have chances to be at least reviewed ?
comment:8 by , 10 years ago
Any patch with tests is worth a review. But of course, we cannot promise it will be accepted.
comment:9 by , 10 years ago
Replying to gst:
I've ran into the exact same issue :/
The main problem I see is that, as far as I understand actually, django compares the url in its url-decoded form against each possible regex pattern. So the problems we are encountering with '/' encoded url value (%2F). Though I could be wrong 'cause I've not went to check django code.
The other possible work around, is to url-encode twice the different parts of the url (so that '/' would be compared as '%2F' when compared to all the url regex patterns and then no more problem also) that you want to reach and then to decode them once in the view.
Though it seems rather special.
After investigation I've found that the 2nd issue (404 error directly from apache) is not related to django and can be avoided by adding "AllowEncodedSlashes On" into apache config. Unfortunately apache replaces %2f with / itself, so the behavior is exactly the same as in simple http server provided by django. In Apache 2.2.18 (which is not released yet, i guess), AllowEncodedSlashes allows value NoDecode. With the value NoDecode, such URLs are accepted, but encoded slashes are not decoded but left in their encoded state. Meanwhile I'm using the workaround
instead of original
in core/handlers/wsgi.py