Opened 18 years ago

Closed 16 years ago

Last modified 16 years ago

#3414 closed (fixed)

middleware/common.py and SCGI bug - string index out of range (caused by missing PATH_INFO)

Reported by: Piotr Maliński <riklaunim@…> Owned by: nobody
Component: Core (Other) Version: dev
Severity: Keywords:
Cc: real.human@…, richard.davies@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I wanted to use Cherokee with SCGI to test my site but I get this exception when trying to view it in the browser (/ main page):

Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/flup-0.5-py2.4.egg/flup/server/scgi_base.py", line 185, in run
  File "/usr/lib64/python2.4/site-packages/flup-0.5-py2.4.egg/flup/server/scgi_base.py", line 456, in handler
  File "/usr/lib64/python2.4/site-packages/django/core/handlers/wsgi.py", line 148, in __call__
    response = self.get_response(request.path, request)
  File "/usr/lib64/python2.4/site-packages/django/core/handlers/base.py", line 59, in get_response
    response = middleware_method(request)
  File "/usr/lib64/python2.4/site-packages/django/middleware/common.py", line 40, in process_request
    if settings.APPEND_SLASH and (old_url[1][-1] != '/') and ('.' not in old_url[1].split('/')[-1]):
IndexError: string index out of range

Django + SCGI + Cherokee worked for me some time ago without any problems. Now on 0.9.5.1 it throws this exception.

Attachments (4)

wsgi.patch (478 bytes ) - added by Jordi Funollet <jordi.f@…> 17 years ago.
wsgi_path_from_many_params.diff (790 bytes ) - added by Jordi Funollet <jordi.f@…> 17 years ago.
Patch against [5823]
path_info_wsgi.diff (563 bytes ) - added by Gabriel Sean Farrell <gsf@…> 17 years ago.
wsgi_path_from_many_params_2.diff.txt (1.0 KB ) - added by Richard Davies <richard.davies@…> 16 years ago.
Update to wsgi_path_from_many_params.diff which better handles QUERY_STRING

Download all attachments as: .zip

Change History (28)

comment:1 by James Bennett, 18 years ago

I've seen something similar under FastCGI; it only happens when APPEND_SLASH is true, and it seems to be something to do with PATH_INFO not being passed in properly. I've had this come up and so have some other folks I've talked to, so there's definitely an issue, I'm just not sure where it is.

comment:2 by Simon G. <dev@…>, 18 years ago

Triage Stage: UnreviewedAccepted

comment:3 by James Bennett, 18 years ago

#3928 was a duplicate.

comment:4 by anonymous, 18 years ago

Happens on Litespeed 3.0 too. For me the APPEND_SLASH has no effect either way, True or False.

comment:5 by brosner <brosner@…>, 17 years ago

I think the issue here is that Django is not allowing an empty value for PATH_INFO. I have run into this problem and found out that Cherokee is passing in an empty value for PATH_INFO. According to the CGI/1.1 specification (RFC 38756 4.1.5 http://www.ietf.org/rfc/rfc3875) PATH_INFO can have an empty value. I am not sure how this will effect the usage of APPEND_SLASH.

comment:6 by anonymous, 17 years ago

Same with CGI, see: ticket:2407

comment:7 by Kelvin Nicholson <kelvin@…>, 17 years ago

Version: 0.95SVN

I can confirm that this happens on the latest build (r1857) of lighttpd. Luckily however, the APPEND_SLASH=False in settings.py took care of the error.

Flup: SVN (May 28, 2007)
Django: SVN (May 5th-ish, 2007)

comment:8 by Jordi Funollet <jordi.f@…>, 17 years ago

About the bug:

  • Doesn't occurs on my development environment:
    • Ubuntu Feisty 7.04 (Linux 2.6.20-15)
    • Lighttpd-1.4.13 + Fastcgi
    • Flup revision 2126
    • Django revision 4463
  • Occurs on my production :-( environment: (Textdrive hosting)
    • FreeBSD 6.2-STABLE
    • Lighttpd-1.4.13 + Fastcgi
    • Flup revision 2126
    • Django revision 4463
  • Value of APPEND_SLASH doesn't change anything.
  • The patch for wsgi.py shown on ticket:2407 doesn't solve this.

About the patch: it fixes the issue for me but I don't have experience with CGI/1.1 specification, so this doesn't intends to be authoritative.

Not tested with django-trunk.

by Jordi Funollet <jordi.f@…>, 17 years ago

Attachment: wsgi.patch added

comment:9 by Jordi Funollet <jordi.f@…>, 17 years ago

Has patch: set

comment:10 by Tai Lee, 17 years ago

Cc: real.human@… added

same problem here using fcgi with lighttpd 1.4.13, django r5518, flup r2311. APPEND_SLASH = False "fixes" it.

comment:11 by Brian Rosner <brosner@…>, 17 years ago

Resolution: fixed
Status: newclosed

This problem should now be fixed with r5688 and was reported with #4484. If I am incorrect and this has not fixed the problem please let all know.

by Jordi Funollet <jordi.f@…>, 17 years ago

Patch against [5823]

comment:12 by Jordi Funollet <jordi.f@…>, 17 years ago

Resolution: fixed
Status: closedreopened

Still reproducible. The quick-and-dirty path attachment:wsgi_path_from_many_params.diff works for me.

comment:13 by James Bennett, 17 years ago

#3762 was a duplicate, and has more information.

by Gabriel Sean Farrell <gsf@…>, 17 years ago

Attachment: path_info_wsgi.diff added

in reply to:  12 comment:14 by Gabriel Sean Farrell <gsf@…>, 17 years ago

Replying to Jordi Funollet <jordi.f@ati.es>:

Still reproducible. The quick-and-dirty path attachment:wsgi_path_from_many_params.diff works for me.

That patch removes the path section of the base url from REDIRECT_URL and REQUEST_URI. So, for example, say my project is at http://example.com/my_project/. If I go to http://example.com/my_project/admin/, I get redirected to http://example.com/admin/. If I go to a page in my project that doesn't exist, let's say http://example.com/my_project/bob/, the "Request URL" on the "Page not found" screen is http://example.com/bob/.

As noted by at #3762 by michael@…, the hacks to the WSGI script from http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango make it all work as it should. Seeing that, I made an even simpler patch: attachment:path_info_wsgi.diff.

comment:15 by gsf@…, 16 years ago

I believe this is related to #285, but all of the hubbub there suggests to me that my one-line patch probably isn't the solution to everyone's problem. Still working for me, though!

by Richard Davies <richard.davies@…>, 16 years ago

Update to wsgi_path_from_many_params.diff which better handles QUERY_STRING

comment:16 by Richard Davies <richard.davies@…>, 16 years ago

Cc: richard.davies@… added

I run Django with Lighttpd, using the error-handler-404 mechanism borrowed from the standard Rails config for this web server (http://github.com/rails/rails/tree/master/railties/configs/lighttpd.conf)

When run in this manner, Lighttpd does not set PATH_INFO (http://trac.lighttpd.net/trac/wiki/FrequentlyAskedQuestions#Whatkindofenvironmentdoesserver.error-handler-404setup), so I have been using Jordi's attachment:wsgi_path_from_many_params.diff, which worked well at first for me, to take self.path from REQUEST_URI in the absence of PATH_INFO.

More recently, when using query strings, I note that REQUEST_URI includes the query string (e.g. "/script/?foo=bar"), whereas self.path should not. I am therefore posting an update to Jordi's patch which correctly strips out the query string from REQUEST_URI before setting self.path. When used in this mode, Lighttpd also does not set QUERY_STRING itself, so I also take the opportunity to set QUERY_STRING based on REQUEST_URI if it is not already present.

comment:17 by Malcolm Tredinnick, 16 years ago

The setup being espoused in comment 16 looks like a terrible way to set up a webserver. I'm not sure we really want to pollute the main code with anything extra just to handle that case. It's using a 404 error path to try and do normal (non-error) handling. The Django docs already explain how to use lighttpd with fastcgi without needing to corrupt an error handler that is intended for an entirely different purpose.

Fortunately, it won't be impossible to work this way, since you can always subclass the WSGI handler and write your own handler for this situation which is even further from a proper WSGI environment than Django normally expects. But I doubt that I'm going to include this in core right at the moment, since it's not an approach we should be encouraging and it's a lot of extra poking into environment variables to work around something (and we would have to maintain it forever).

comment:18 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: reopenedclosed

(In [8015]) Changed/fixed the way Django handles SCRIPT_NAME and PATH_INFO (or
equivalents). Basically, URL resolving will only use the PATH_INFO and the
SCRIPT_NAME will be prepended by reverse() automatically. Allows for more
portable development and installation. Also exposes SCRIPT_NAME in the
HttpRequest instance.

There are a number of cases where things don't work completely transparently,
so mod_python and fastcgi users should read the relevant docs.

Fixed #285, #1516, #3414.

comment:19 by Richard Davies <richard.davies@…>, 16 years ago

Let me quickly explain the logic of the setup that I mentioned in comment 16 (note that this copies the _standard_ way to set up Rails with Lighttpd, it isn't something that I invented myself!).

Compared to the Django lighttpd config at http://www.djangoproject.com/documentation/fastcgi/#lighttpd-setup , the directories are "inside out".

That config has MEDIA_ROOT/MEDIA_URL as a subdirectory of the site on the webserver, and uses url rewriting to handle media such as favicon.ico, robots.txt, etc. which are expected in the top level or otherwise outside the media subdirectory.

The Rails-style config has MEDIA_ROOT/MEDIA_URL as the root-level directory of the site on the webserver, and then connect the error-handler-404 to Django - this means that if a file is there then it gets served whilst if it is not then the URL falls through for Django to handle. So, favicon, etc can just be put directly in their right place.

comment:20 by Richard Davies <richard.davies@…>, 16 years ago

Resolution: fixed
Status: closedreopened
Summary: middleware/common.py and SCGI bug - string index out of rangemiddleware/common.py and SCGI bug - string index out of range (caused by missing PATH_INFO)

Regardless of the rights or wrongs of the Lighttpd approach in my comment 16, the target of this ticket is to find a solution for cases where PATH_INFO is not set in the incoming environment (first identified in comment 5 in the case of Cherokee). [8015] fixes #285, but still assumes that PATH_INFO is correctly set in the incoming environment, so cannot close this ticket.

comment:21 by James Bennett, 16 years ago

Resolution: fixed
Status: reopenedclosed

Having looked a bit at the fix for #285 and done a bit of research, I believe the correct conclusion here is:

"Doctor, it hurts when I configure my server this way!"

"Well, then, don't configure your server that way."

If you set things up in such a way that PATH_INFO is not available to Django or has been mangled prior to handing off to Django, I don't think Django can help you much, really; we can't magically reconstruct information that wasn't given to us in the first place.

comment:22 by Richard Davies <richard.davies@…>, 16 years ago

A note for anyone following this thread, or experiencing this problem - Flup 1.0.1 has been released, and now attempts to generate PATH_INFO if it is missing. This means that these problems are now handled before reaching Django.

comment:23 by pje, 16 years ago

Note: the WSGI spec allows PATH_INFO to be empty or missing; specifically:

"This may be an empty string, if the request URL targets the application root and does NOT have a trailing slash." (emph. added)

And WSGI servers are allowed to omit PATH_INFO (and various other variables) if they are an empty string.

IIUC, this means that [8105] doesn't correctly handle the case where someone goes to "foo.com/django" (no trailing '/'), because it wrongly assumes that a missing PATH_INFO is a '/'. Per the WSGI spec, a missing PATH_INFO is in fact an empty string. That means that relative URLs at the root of a Django site would not work correctly under servers that omit an empty PATH_INFO.

Whether the OP issue here is a configuration problem is irrelevant to this piece: it is perfectly legal for a WSGI server to omit PATH_INFO if it's an empty string, and its omission means that it's an EMPTY string, not a '/'.

Conversely, if a WSGI server is ommitting PATH_INFO when PATH_INFO should be a "/" (i.e. the URL was "foo.com/django/" with a trailing "/"), then that server is seriously broken and should be fixed. (But I'm not seeing anything here that suggests this is actually the case.)

Either way, however, the code that's defaulting a missing PATH_INFO to "/" appears to be quite wrong: either creating a bug or masking one somewhere else.

comment:24 by Malcolm Tredinnick, 16 years ago

Posting to a closed ticket is a good way to make sure a comment gets overlooked. Fortunately, in this case I saw it go by, so I've opened #9435 to make sure any inconsistencies are tidied up.

Note: See TracTickets for help on using tickets.
Back to Top