Opened 15 years ago

Closed 9 years ago

#12464 closed Bug (duplicate)

Empty PATH_INFO causes blank SCRIPT_NAME

Reported by: chrisw Owned by: Greg Wogan-Browne
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Graham.Dumpleton@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Ramiro Morales)

Take the following Apache config snippet:

RewriteEngine On
WSGIScriptAlias /project /django/project/bin/django.wsgi

This results in mod_rewrite setting the SCRIPT_URL environment variable.

The code at http://code.djangoproject.com/browser/django/tags/releases/1.1.1/django/core/handlers/base.py#L204:

    script_url = environ.get('SCRIPT_URL', u'')
    if not script_url:
        script_url = environ.get('REDIRECT_URL', u'')
    if script_url:
        return force_unicode(script_url[:-len(environ.get('PATH_INFO', ''))])
    return force_unicode(environ.get('SCRIPT_NAME', u''))

...behaves incorrectly when, in the above example, /project is requested from Apache.

The problem is when PATH_INFO is empty, as it is in this case. Python
has no notion of positive or negative zero (;-)) so we end up returning
script_url[:0]. This results in request.META['SCRIPT_NAME']being set to '', which in turn results in urls being generated incorrectly.

A workaround for this is to add the following rewrite rule before !WSGIScriptAlias:

RewriteRule ^/project$ /project/ [R]

This bug is potentially related to #9435, but this is ticket describes a real world problem that I believe has bitten more people than just me...

A solution for this could well just be to change the section to:

    if script_url:
        path_info = environ.get('PATH_INFO', '')
        if path_info:
            script_url = script_url[:-len(path_info)]
        return force_unicode(script_url)
    return force_unicode(environ.get('SCRIPT_NAME', u''))

...but I have no idea where to go about writing a test.

Attachments (1)

12464.patch (2.2 KB ) - added by Greg Wogan-Browne 14 years ago.
Patch against revision 13959

Download all attachments as: .zip

Change History (18)

comment:1 by Ramiro Morales, 15 years ago

Description: modified (diff)

(reformatted description, please use the Preview button before submitting a ticket)

comment:2 by Russell Keith-Magee, 15 years ago

Triage Stage: UnreviewedAccepted

comment:3 by Greg Wogan-Browne, 14 years ago

Has patch: set
milestone: 1.3
Owner: changed from nobody to Greg Wogan-Browne
Status: newassigned
Triage Stage: AcceptedReady for checkin
Version: 1.1SVN

I was bitten by this today so I wrote up a quick patch (including a regression test).

comment:4 by Thejaswi Puthraya, 14 years ago

Component: UncategorizedHTTP handling

by Greg Wogan-Browne, 14 years ago

Attachment: 12464.patch added

Patch against revision 13959

comment:5 by Greg Wogan-Browne, 14 years ago

Triage Stage: Ready for checkinAccepted

Updated patch to use unittest rather than doctests, re-factored the test slightly (to test directly against get_script_name).

comment:6 by Graham Dumpleton, 14 years ago

For the record, SCRIPT_URL only appears to get set when the RewriteEngine directive is at top level scope in VirtualHost (or possibly server as well). If the RewriteEngine directive is within a Directory block, thus:

  WSGIScriptAlias /project /django/project/bin/django.wsgi

  <Directory django/project/bin>
  ...
  RewriteEngine On
  <Directory>

it doesn't seem to be set.

Thus, another work around is if rewrites being done can be done in Directory context instead, then shift the RewriteEngine directive into the Directory context along with the rewrite rules.

comment:7 by David Chandek-Stark, 14 years ago

What needs to be done here to move the stage back to "ready for checkin"?

Now, it appears that the currently available Django solution to this issue is to set FORCE_SCRIPT_NAME, e.g.:

FORCE_SCRIPT_NAME = '/project'

Nevertheless, although the origin of this problem (i.e., inconsistent setting of SCRIPT_URL) lies outside of Django, it seems to me basic sanity to avoid whacking a string by slicing it with [:-0].

comment:8 by Graham Dumpleton, 14 years ago

Anyone know what server sets REDIRECT_URL. The Apache/mod_rewrite module doesn't. It sets SCRIPT_URL and REDIRECT_SCRIPT_URL. So, don't know if REDIRECT_URL is for a different server such as lighttpd, nginx or cherokee, or whether the code as been wrong as far as what variable it is checking for.

It also may be arguable that Apache/mod_wsgi itself should use SCRIPT_URL and REDIRECT_SCRIPT_URL to automatically rewrite SCRIPT_NAME back to the original value when rewrites occur. This had been considered albeit would have used internal Apache C request structures to derive it rather than relying of the SetEnv variables set by mod_rewrite internally. In doing that, would be necessary to clear SCRIPT_URL and REDIRECT_SCRIPT_URL though so something like Django would try and then fix it up a second time and potentially muck up the fixup.

If such a fixup were added to Apache/mod_wsgi, would be on by default, but able to be turned off.

comment:9 by Graham Dumpleton, 14 years ago

The REDIRECT_URL variable is actually setup for CGI environment by Apache and not my mod_rewrite. Because mod_wsgi gets Apache to build the CGI environment but passes it as part of WSGI environment, that is where it is coming. FASTCGI and SCGI etc would similarly inherit REDIRECT_URL in that way.

Thus, although the code in Django doing this fixups is documented as being related to mod_rewrite, it is actually using information from other sources besides that.

comment:10 by Graham Dumpleton, 14 years ago

The code as it is technically will also only work if there is at most one internal redirect undertaken within Apache. In the case where there is more than one, Apache will prefix environment variables with 'REDIRECT_' at each iteration. This is where 'REDIRECT_SCRIPT_URL' comes from that mod_rewrite code internally checks for. Thus a robust implementation, to get hold of the very original URL should find the variant of SCRIPT_URL with the longest sequence of 'REDIRECT_' prefixes. Ie., 'REDIRECT_REDIRECT_SCRIPT_URL' takes precedence over 'REDIRECT_SCRIPT_URL' takes precedence of 'SCRIPT_URL'.

A further case which presents a problem and where the code may not work is when an internal redirect is performed, it is possible that PATH_INFO from the parent isn't simply passed through but is modified in the process. This could result in the final PATH_INFO not even matching what is present in tail of 'SCRIPT_URL' and so simply dropping off 'SCRIPT_URL' the number of characters present in 'PATH_INFO' may give incorrect results.

Take for example the convoluted example:

WSGIScriptAlias /myapp /Library/WebServer/Sites/hello-1/htdocs/environ.wsgi

RewriteEngine On
RewriteRule ^/welcome(/.*)$ /myapp/about/$1 [QSA,PT,L]

The variables which go through to the app are:

PATH_INFO: '/about/index'
QUERY_STRING: ''
REQUEST_URI: '/welcome/index'
SCRIPT_FILENAME: '/Library/WebServer/Sites/hello-1/htdocs/environ.wsgi'
SCRIPT_NAME: '/myapp'
SCRIPT_URI: 'http://hello-1.example.com/welcome/index'
SCRIPT_URL: '/welcome/index'

For code:

    script_url = environ.get('SCRIPT_URL', u'')
    if not script_url:
        script_url = environ.get('REDIRECT_URL', u'')
    if script_url:
        return force_unicode(script_url[:-len(environ.get('PATH_INFO', ''))])

Pass this through the current algorithm and you get:

>>> '/welcome/index'[:-len('/about/index')]
'/w'

which isn't right.

It is because of this latter issue that mod_wsgi hasn't in the past tried to do fixups to SCRIPT_NAME on internal redirects in Apache, as it can't know exactly what the nature of any URL manipulations are. This is why manual fixups in the WSGi script file have been suggested, where people can tailor it the specifics of the sort of redirect performed. Thus any automatic fixup in Apache/mod_wsgi seems unlikely.

comment:11 by Graham Dumpleton, 14 years ago

Cc: Graham.Dumpleton@… added

comment:12 by Matt McClanahan, 14 years ago

milestone: 1.3
Severity: Normal
Type: Bug

comment:13 by patchhammer, 14 years ago

Easy pickings: unset
Patch needs improvement: set

12464.patch fails to apply cleanly on to trunk

comment:14 by Jannis Leidel, 13 years ago

Resolution: needsinfo
Status: assignedclosed
UI/UX: unset

I honestly couldn't follow this ticket's discussion since it seemed to have diverged into a Apache support thread. Please provide more information of that Django does wrong here.

comment:15 by Karen Tracey, 13 years ago

Resolution: needsinfo
Status: closedreopened

I think there is enough information in the original description to identify one thing Django is doing wrong. True, the description is lacking what "results in urls being generated incorrectly" means. Background can be found in this django-users thread: http://groups.google.com/group/django-users/browse_thread/thread/ce1436670a3c55d5/39a1f95356c1dd5c, which starts off with the url reversal problem that led to this ticket being opened. I think it would be worthwhile to fix that specific case, which looks like it could be hit pretty easily.

The most recent follow-ups seem to be pointing out additional cases where this particular area of the code (even after fixing the problem identified in this ticket) is is going to produce incorrect results:

  • If there have been multiple internal redirects done by Apache, then Django won't be looking for the right env var to find the info, since Apache apparently continues to prepend REDIRECT_ onto SCRIPT_URL where Django simply looks at "REDIRCT_URL", which is apparently not even a var set by the rewrite engine, but rather something set for CGI environment. Net I get from this is that Django isn't really looking at the right env var for getting this information, but it apparently works at least for the simple case of one internal redirect.
  • If the Apache rewrite setup is actually transforming the path, this area of code will produce incorrect results. Net seems to be it's possible to set up rewrites in a way such that it's going to be impossible for Django to properly reverse URLs: there's just not enough information left in what we get to re-construct the url to reach the "root" of the project. It sounds to me like this problem is not fixable within Django, but perhaps could be addressed via documentation of the limitation and instructions on what to set in the wsgi file to allow Django to properly reverse URLs.

I think it would be worthwhile to solve the first problem with this ticket. The others identified seem to me to be less likely to be hit in the normal case, but could be addressed be separate individual ticket(s?) that perhaps simply describe the limitations of the code here and give instructions for how to set things in the wsgi file to overcome those limitations.

comment:16 by Aymeric Augustin, 12 years ago

Status: reopenednew

comment:17 by Marten Kenbeek, 9 years ago

Resolution: duplicate
Status: newclosed
Note: See TracTickets for help on using tickets.
Back to Top