Django

Code

Ticket #3414 (closed: fixed)

Opened 2 years ago

Last modified 8 months ago

middleware/common.py and SCGI bug - string index out of range (caused by missing PATH_INFO)

Reported by: Piotr MaliƄski <riklaunim@gmail.com> Assigned to: nobody
Milestone: Component: Core framework
Version: SVN Keywords:
Cc: real.human@mrmachine.net, richard.davies@elastichosts.com Triage Stage: Accepted
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

I wanted to use Cherokee with SCGI to test my site but I get this exception when trying to view it in the browser (/ main page):

Traceback (most recent call last):
  File "/usr/lib64/python2.4/site-packages/flup-0.5-py2.4.egg/flup/server/scgi_base.py", line 185, in run
  File "/usr/lib64/python2.4/site-packages/flup-0.5-py2.4.egg/flup/server/scgi_base.py", line 456, in handler
  File "/usr/lib64/python2.4/site-packages/django/core/handlers/wsgi.py", line 148, in __call__
    response = self.get_response(request.path, request)
  File "/usr/lib64/python2.4/site-packages/django/core/handlers/base.py", line 59, in get_response
    response = middleware_method(request)
  File "/usr/lib64/python2.4/site-packages/django/middleware/common.py", line 40, in process_request
    if settings.APPEND_SLASH and (old_url[1][-1] != '/') and ('.' not in old_url[1].split('/')[-1]):
IndexError: string index out of range

Django + SCGI + Cherokee worked for me some time ago without any problems. Now on 0.9.5.1 it throws this exception.

Attachments

wsgi.patch (478 bytes) - added by Jordi Funollet <jordi.f@ati.es> on 05/29/07 17:14:56.
wsgi_path_from_many_params.diff (0.8 kB) - added by Jordi Funollet <jordi.f@ati.es> on 08/08/07 17:48:34.
Patch against [5823]
path_info_wsgi.diff (0.5 kB) - added by Gabriel Sean Farrell <gsf@breaksalot.com> on 03/14/08 18:01:20.
wsgi_path_from_many_params_2.diff.txt (1.0 kB) - added by Richard Davies <richard.davies@elastichosts.com> on 07/20/08 09:26:44.
Update to wsgi_path_from_many_params.diff which better handles QUERY_STRING

Change History

02/01/07 11:47:28 changed by ubernostrum

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

I've seen something similar under FastCGI; it only happens when APPEND_SLASH is true, and it seems to be something to do with PATH_INFO not being passed in properly. I've had this come up and so have some other folks I've talked to, so there's definitely an issue, I'm just not sure where it is.

03/07/07 23:45:10 changed by Simon G. <dev@simon.net.nz>

  • stage changed from Unreviewed to Accepted.

04/04/07 20:37:17 changed by ubernostrum

#3928 was a duplicate.

04/05/07 04:37:51 changed by anonymous

Happens on Litespeed 3.0 too. For me the APPEND_SLASH has no effect either way, True or False.

05/03/07 12:05:29 changed by brosner <brosner@gmail.com>

I think the issue here is that Django is not allowing an empty value for PATH_INFO. I have run into this problem and found out that Cherokee is passing in an empty value for PATH_INFO. According to the CGI/1.1 specification (RFC 38756 4.1.5 http://www.ietf.org/rfc/rfc3875) PATH_INFO can have an empty value. I am not sure how this will effect the usage of APPEND_SLASH.

05/04/07 02:37:19 changed by anonymous

Same with CGI, see: ticket:2407

05/28/07 06:47:30 changed by Kelvin Nicholson <kelvin@kelvinism.com>

  • version changed from 0.95 to SVN.

I can confirm that this happens on the latest build (r1857) of lighttpd. Luckily however, the APPEND_SLASH=False in settings.py took care of the error.

Flup: SVN (May 28, 2007) Django: SVN (May 5th-ish, 2007)

05/29/07 17:13:39 changed by Jordi Funollet <jordi.f@ati.es>

About the bug:

  • Doesn't occurs on my development environment:
    • Ubuntu Feisty 7.04 (Linux 2.6.20-15)
    • Lighttpd-1.4.13 + Fastcgi
    • Flup revision 2126
    • Django revision 4463
  • Occurs on my production :-( environment: (Textdrive hosting)
    • FreeBSD 6.2-STABLE
    • Lighttpd-1.4.13 + Fastcgi
    • Flup revision 2126
    • Django revision 4463
  • Value of APPEND_SLASH doesn't change anything.
  • The patch for wsgi.py shown on ticket:2407 doesn't solve this.

About the patch: it fixes the issue for me but I don't have experience with CGI/1.1 specification, so this doesn't intends to be authoritative.

Not tested with django-trunk.

05/29/07 17:14:56 changed by Jordi Funollet <jordi.f@ati.es>

  • attachment wsgi.patch added.

05/29/07 17:15:49 changed by Jordi Funollet <jordi.f@ati.es>

  • has_patch set to 1.

07/02/07 09:10:35 changed by mrmachine

  • cc set to real.human@mrmachine.net.

same problem here using fcgi with lighttpd 1.4.13, django r5518, flup r2311. APPEND_SLASH = False "fixes" it.

07/13/07 17:31:31 changed by Brian Rosner <brosner@gmail.com>

  • status changed from new to closed.
  • resolution set to fixed.

This problem should now be fixed with r5688 and was reported with #4484. If I am incorrect and this has not fixed the problem please let all know.

08/08/07 17:48:34 changed by Jordi Funollet <jordi.f@ati.es>

  • attachment wsgi_path_from_many_params.diff added.

Patch against [5823]

(follow-up: ↓ 14 ) 08/08/07 17:52:54 changed by Jordi Funollet <jordi.f@ati.es>

  • status changed from closed to reopened.
  • resolution deleted.

Still reproducible. The quick-and-dirty path attachment:wsgi_path_from_many_params.diff works for me.

09/16/07 13:36:34 changed by ubernostrum

#3762 was a duplicate, and has more information.

03/14/08 18:01:20 changed by Gabriel Sean Farrell <gsf@breaksalot.com>

  • attachment path_info_wsgi.diff added.

(in reply to: ↑ 12 ) 03/14/08 18:05:26 changed by Gabriel Sean Farrell <gsf@breaksalot.com>

Replying to Jordi Funollet <jordi.f@ati.es>:

Still reproducible. The quick-and-dirty path attachment:wsgi_path_from_many_params.diff works for me.

That patch removes the path section of the base url from REDIRECT_URL and REQUEST_URI. So, for example, say my project is at http://example.com/my_project/. If I go to http://example.com/my_project/admin/, I get redirected to http://example.com/admin/. If I go to a page in my project that doesn't exist, let's say http://example.com/my_project/bob/, the "Request URL" on the "Page not found" screen is http://example.com/bob/.

As noted by at #3762 by michael@lofiart.com, the hacks to the WSGI script from http://code.google.com/p/modwsgi/wiki/IntegrationWithDjango make it all work as it should. Seeing that, I made an even simpler patch: attachment:path_info_wsgi.diff.

06/19/08 18:11:56 changed by gsf@breaksalot.org

I believe this is related to #285, but all of the hubbub there suggests to me that my one-line patch probably isn't the solution to everyone's problem. Still working for me, though!

07/20/08 09:26:44 changed by Richard Davies <richard.davies@elastichosts.com>

  • attachment wsgi_path_from_many_params_2.diff.txt added.

Update to wsgi_path_from_many_params.diff which better handles QUERY_STRING

07/20/08 09:48:35 changed by Richard Davies <richard.davies@elastichosts.com>

  • cc changed from real.human@mrmachine.net to real.human@mrmachine.net, richard.davies@elastichosts.com.

I run Django with Lighttpd, using the error-handler-404 mechanism borrowed from the standard Rails config for this web server (http://github.com/rails/rails/tree/master/railties/configs/lighttpd.conf)

When run in this manner, Lighttpd does not set PATH_INFO (http://trac.lighttpd.net/trac/wiki/FrequentlyAskedQuestions#Whatkindofenvironmentdoesserver.error-handler-404setup), so I have been using Jordi's attachment:wsgi_path_from_many_params.diff, which worked well at first for me, to take self.path from REQUEST_URI in the absence of PATH_INFO.

More recently, when using query strings, I note that REQUEST_URI includes the query string (e.g. "/script/?foo=bar"), whereas self.path should not. I am therefore posting an update to Jordi's patch which correctly strips out the query string from REQUEST_URI before setting self.path. When used in this mode, Lighttpd also does not set QUERY_STRING itself, so I also take the opportunity to set QUERY_STRING based on REQUEST_URI if it is not already present.

07/20/08 15:30:39 changed by mtredinnick

The setup being espoused in comment 16 looks like a terrible way to set up a webserver. I'm not sure we really want to pollute the main code with anything extra just to handle that case. It's using a 404 error path to try and do normal (non-error) handling. The Django docs already explain how to use lighttpd with fastcgi without needing to corrupt an error handler that is intended for an entirely different purpose.

Fortunately, it won't be impossible to work this way, since you can always subclass the WSGI handler and write your own handler for this situation which is even further from a proper WSGI environment than Django normally expects. But I doubt that I'm going to include this in core right at the moment, since it's not an approach we should be encouraging and it's a lot of extra poking into environment variables to work around something (and we would have to maintain it forever).

07/21/08 02:57:10 changed by mtredinnick

  • status changed from reopened to closed.
  • resolution set to fixed.

(In [8015]) Changed/fixed the way Django handles SCRIPT_NAME and PATH_INFO (or equivalents). Basically, URL resolving will only use the PATH_INFO and the SCRIPT_NAME will be prepended by reverse() automatically. Allows for more portable development and installation. Also exposes SCRIPT_NAME in the HttpRequest instance.

There are a number of cases where things don't work completely transparently, so mod_python and fastcgi users should read the relevant docs.

Fixed #285, #1516, #3414.

07/21/08 03:18:23 changed by Richard Davies <richard.davies@elastichosts.com>

Let me quickly explain the logic of the setup that I mentioned in comment 16 (note that this copies the _standard_ way to set up Rails with Lighttpd, it isn't something that I invented myself!).

Compared to the Django lighttpd config at http://www.djangoproject.com/documentation/fastcgi/#lighttpd-setup , the directories are "inside out".

That config has MEDIA_ROOT/MEDIA_URL as a subdirectory of the site on the webserver, and uses url rewriting to handle media such as favicon.ico, robots.txt, etc. which are expected in the top level or otherwise outside the media subdirectory.

The Rails-style config has MEDIA_ROOT/MEDIA_URL as the root-level directory of the site on the webserver, and then connect the error-handler-404 to Django - this means that if a file is there then it gets served whilst if it is not then the URL falls through for Django to handle. So, favicon, etc can just be put directly in their right place.

07/21/08 03:37:43 changed by Richard Davies <richard.davies@elastichosts.com>

  • status changed from closed to reopened.
  • resolution deleted.
  • summary changed from middleware/common.py and SCGI bug - string index out of range to middleware/common.py and SCGI bug - string index out of range (caused by missing PATH_INFO).

Regardless of the rights or wrongs of the Lighttpd approach in my comment 16, the target of this ticket is to find a solution for cases where PATH_INFO is not set in the incoming environment (first identified in comment 5 in the case of Cherokee). [8015] fixes #285, but still assumes that PATH_INFO is correctly set in the incoming environment, so cannot close this ticket.

07/21/08 03:46:55 changed by ubernostrum

  • status changed from reopened to closed.
  • resolution set to fixed.

Having looked a bit at the fix for #285 and done a bit of research, I believe the correct conclusion here is:

"Doctor, it hurts when I configure my server this way!"

"Well, then, don't configure your server that way."

If you set things up in such a way that PATH_INFO is not available to Django or has been mangled prior to handing off to Django, I don't think Django can help you much, really; we can't magically reconstruct information that wasn't given to us in the first place.

07/28/08 08:37:18 changed by Richard Davies <richard.davies@elastichosts.com>

A note for anyone following this thread, or experiencing this problem - Flup 1.0.1 has been released, and now attempts to generate PATH_INFO if it is missing. This means that these problems are now handled before reaching Django.

10/23/08 12:22:40 changed by pje

Note: the WSGI spec allows PATH_INFO to be empty or missing; specifically:

"This may be an empty string, if the request URL targets the application root and does NOT have a trailing slash." (emph. added)

And WSGI servers are allowed to omit PATH_INFO (and various other variables) if they are an empty string.

IIUC, this means that [8105] doesn't correctly handle the case where someone goes to "foo.com/django" (no trailing '/'), because it wrongly assumes that a missing PATH_INFO is a '/'. Per the WSGI spec, a missing PATH_INFO is in fact an empty string. That means that relative URLs at the root of a Django site would not work correctly under servers that omit an empty PATH_INFO.

Whether the OP issue here is a configuration problem is irrelevant to this piece: it is perfectly legal for a WSGI server to omit PATH_INFO if it's an empty string, and its omission means that it's an EMPTY string, not a '/'.

Conversely, if a WSGI server is ommitting PATH_INFO when PATH_INFO should be a "/" (i.e. the URL was "foo.com/django/" with a trailing "/"), then that server is seriously broken and should be fixed. (But I'm not seeing anything here that suggests this is actually the case.)

Either way, however, the code that's defaulting a missing PATH_INFO to "/" appears to be quite wrong: either creating a bug or masking one somewhere else.

10/23/08 22:46:34 changed by mtredinnick

Posting to a closed ticket is a good way to make sure a comment gets overlooked. Fortunately, in this case I saw it go by, so I've opened #9435 to make sure any inconsistencies are tidied up.


Add/Change #3414 (middleware/common.py and SCGI bug - string index out of range (caused by missing PATH_INFO))




Change Properties
Action