Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#15718 closed Bug (wontfix)

Django unquotes urls and not able to distinguish %2F and /

Reported by: fed239 Owned by: nobody
Component: Core (Other) Version: 1.2
Severity: Normal Keywords: urls, url resolver, unquote, %2F
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX:

Description

I've found that in basehttp.py there is a line

env['PATH_INFO'] = urllib.unquote(path)

It replaces all URL-escaped symbols with original symbols. This leads to a situation that you can not properly handle urls with quoted symbols in your urls.py. For example url http://example.com/blah%2Fblah%2Fblah/ will be matched by regexp /(\w+)/(\w+)/(\w+)/$

Under apache with mod_wsgi this seems to lead to even more interesting problem. When %2F is present in URL, request is not handled by django and user gets 404 error directly from apache. Try http://www.djangoproject.com/%2F

Attachments (0)

Change History (5)

comment:1 Changed 3 years ago by fed239

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

After investigation I've found that the 2nd issue (404 error directly from apache) is not related to django and can be avoided by adding "AllowEncodedSlashes On" into apache config. Unfortunately apache replaces %2f with / itself, so the behavior is exactly the same as in simple http server provided by django. In Apache 2.2.18 (which is not released yet, i guess), AllowEncodedSlashes allows value NoDecode. With the value NoDecode, such URLs are accepted, but encoded slashes are not decoded but left in their encoded state. Meanwhile I'm using the workaround

        request_uri = force_unicode(environ.get('REQUEST_URI', u'/'))
        if u'?' in request_uri:
            path_info,query = request_uri.split('?',1)
        else:
            path_info,query = request_uri,''

instead of original

        path_info = force_unicode(environ.get('PATH_INFO', u'/'))

in core/handlers/wsgi.py

comment:2 Changed 3 years ago by lukeplant

  • Type set to Bug

comment:3 Changed 3 years ago by lukeplant

  • Severity set to Normal

comment:4 Changed 3 years ago by jacob

  • Resolution set to wontfix
  • Status changed from new to closed

I'm fairly sure that this is present in the dev server specifically because it mimics Apache's behavior -- as you've discovered. Changing this would mean that the dev server would behave differently than production servers.

In fact, poking further, this is more or less enshrined by the WSGI spec -- it's expected that you'll need to re-quote the path if you need to construct the originally given URL.

Further, this would be a devestatingly difficult-to-debug backwards-incompatible change.

Given all that, I'm marking this wontfix: there's simply no real upside to making this change.

comment:5 Changed 3 years ago by anonymous

  • Easy pickings unset

I don't agree that there is no upside. Currently URL http://example.com/A%2fB/C/ will match pattern ^([^/]+)/([^/]+)/([^/]+)/$ instead of expected ^([^/]+)/([^/]+)/$

This restricts usage of URL patterns.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.