Opened 18 years ago

Closed 11 years ago

#2131 closed New feature (wontfix)

HttpResponseSendFile for serving static files handler-specific sendfile mechanism

Reported by: ymasuda[at]ethercube.com Owned by: nobody
Component: HTTP handling Version: dev
Severity: Normal Keywords:
Cc: Graham.Dumpleton@…, jdunck@…, django-ticket-2131@…, andreas@…, andrehcampos@…, gregoire@…, gabor@…, ville@…, hackeron@…, exelib@…, redalastor@…, dane.springmeyer@…, Jannis Leidel, dan@…, danny.adair@…, Florian Apolloner, Jari Pennanen, hv@…, bronger@…, David Larlet, mmitar@…, jivan@…, darkrho@…, onecreativenerd@…, milos.urbanek@…, andre.cruz@…, sfllaw@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

This patch modifies modpython handler and WSGI (basehttp) handler to add HttpResponseSendFile.
The HttpResponseSendFile allows static files to be served using handler-specific sendfile mechanism.
It uses response.sendfile() on mod_python, FileWrapper on wsgi.

Attachments (13)

http-response-send-file.diff (3.0 KB ) - added by ymasuda[at]ethercube.com 18 years ago.
diff implements HttpResponseSendFile
http-response-send-file.2.diff (2.9 KB ) - added by ymasuda[at]ethercube.com 18 years ago.
fixed typos above, sorry
django-ticket-2131-http-response-send-file.2-django1.0.2final.diff (3.4 KB ) - added by mizatservercave 16 years ago.
updated for Django 1.0.2
django-ticket-2131-http-response-send-file.2-django1.0.2final-v1.1.diff (3.4 KB ) - added by mizatservercave 16 years ago.
Supersedes previous patches for Django 1.0.2 final.
django-ticket-2131-http-response-send-file.2-django1.0.2final-v1.2_test.diff (1.9 KB ) - added by mizatservercave 16 years ago.
PARTIAL patch (FOR TESTING ONLY) to verify HttpResponseSendFile works without specific handler support
django-ticket-2131-http-response-send-file.2-django1.0.2final-v1.2.diff (3.6 KB ) - added by mizatservercave 16 years ago.
Updated patch. Adds support for 'response.content'
httpresponsesendfile-no-default-content.diff (3.7 KB ) - added by mrts 16 years ago.
Disallow default HttpResponse behaviour (content etc)
httpresponsesendfile-no-default-content_bypass-middleware.diff (7.0 KB ) - added by mrts 16 years ago.
Bypass middleware
httpresponsesendfile-no-default-content_bypass-middleware.1.diff (7.0 KB ) - added by mrts 16 years ago.
Fix method naming typo
httpresponsesendfile-no-default-content_bypass-middleware.2.diff (7.0 KB ) - added by mrts 16 years ago.
Doh, previous diff didn't contain the fix, second try.
httpresponsesendfile-no-default-content_bypass-middleware.3.diff (7.0 KB ) - added by mrts 16 years ago.
Fix yet another copy-paste omission.
httpresponsesendfile-no-default-content_bypass-middleware_with-header_with-docs-and-tests.diff (12.3 KB ) - added by mrts 16 years ago.
Added the requested header, as well as docs and tests.
ticket2131.diff (13.6 KB ) - added by milosu 13 years ago.
updated patch compatible with Django-1.4b1 including test case

Download all attachments as: .zip

Change History (113)

by ymasuda[at]ethercube.com, 18 years ago

diff implements HttpResponseSendFile

comment:1 by anonymous, 18 years ago

Summary: HttpResponseSendFile for serving static files handler-specific sendfile mechanism[patch] HttpResponseSendFile for serving static files handler-specific sendfile mechanism

by ymasuda[at]ethercube.com, 18 years ago

fixed typos above, sorry

comment:2 by Adrian Holovaty, 18 years ago

Resolution: wontfix
Status: newclosed

Django isn't meant to serve static files, so I'm marking this as a wontfix.

comment:3 by alex@…, 18 years ago

This patch could be very useful in a situations with a password protected files to a specific users. Typically I am talking about e-commerce site which sells PDF books. I am currently returning data via standard HttpResponse. It works well for small files,
but it would probably be a problem for bigger files.

comment:4 by Stefan (stelund on gmail), 17 years ago

I've implemented a similar workaround to restrict photo images accesses. I think its ridiculous not to implement this feature to aid performance for views.static.serve() as well. HttpResponse is implemented as an iterator so it fits perfectly to read the file chunk by chunk. The performance increase i've seen is great and i'm happy not to need another tcp-server running on my box.

in reply to:  2 ; comment:5 by johann@…, 17 years ago

Replying to adrian:

Django isn't meant to serve static files, so I'm marking this as a wontfix.

But this feature would be useful to reduce memory usage with large dynamically generated content.

In my application, I generate large (30MB) ZIP files in Django and send them through response.write(). The ZIP files are just collections of PNG files, so it doesn't make sense to store them on disk permanently.

Unfortunately, using response.write() means that the whole ZIP file must be loaded into memory. It would be so cool if I could create the file on disk and then use sendfile. The same use-case applies to generated PDF files.

As a workaround, I could write the file to disk and then redirect the browser, but that's tedious, requires another TCP round-trip, and makes automated downloads more difficult.

in reply to:  5 comment:6 by johann@…, 17 years ago

Here's a solution that doesn't require patching Django. It's not the real sendfile(), but it loads and sends the file in 8KB chunks, rather than loading the whole file into memory.

http://www.djangosnippets.org/snippets/365/

It uses the FileWrapper object from source:django/trunk/django/core/servers/basehttp.py.

comment:7 by TP, 16 years ago

Even using a file-wrapper iterator does not speed up things as much as directly using sendfile. Using mod_wsgi's exposed sendfile uses 30x less CPU vs Django-with-iterator on my linux box testing downloading hundreds of images concurrently.

comment:8 by Graham Dumpleton, 16 years ago

Trac also isn't meant to serve static media files either and preferred setup is to use Apache to do it directly. In Trac however you have the concept of attachments associated with tickets. For such a thing as attachments, where it is still stored in file system, but where Apache cant get to it, you have no choice but to serve it through the application. Any similar sort of scenario in Django is currently not very efficient in terms of performance, or in terms of memory usage if people don't use techniques to limit memory impact. Thus, Django should really provide a default way of handling this efficiently and saying 'wontfix' means that you haven't looked very widely at how Django can be used. :-)

comment:9 by Jeremy Dunck, 16 years ago

Cc: jdunck@… added

comment:10 by Malcolm Tredinnick, 16 years ago

milestone: post-1.0
Resolution: wontfix
Status: closedreopened
Triage Stage: UnreviewedDesign decision needed

Claiming we "haven't looked very widely" is ridiculous, given what Django can do. More possibly "some cases may have been overlooked" (there are alternatives to the Trac setup). So let's try to avoid hyperbole. :-)

Reopening for now and we can look at it after 1.0.

comment:11 by mrts, 16 years ago

+1 on including this. My use case: instrument Java classfiles, generate a JAR and return it to the user on the fly.

comment:12 by graham.carlyle@…, 16 years ago

Another ticket that covers this issue is #7984, which probably means that its a duplicate. Sorry I didn't spot this ticket when I filed it. The patch on #7984 is very similar to this one :) but has some extra logic related to content-length for mod_wsgi because of some amibguities in the WSGI spec. (see discussion on mod_wsgi group)

comment:13 by graham.carlyle@…, 16 years ago

gah i'm an idiot I meant #7894

comment:14 by Miz, 16 years ago

Cc: django-ticket-2131@… added

comment:15 by anonymous, 16 years ago

Cc: andreas@… added

comment:17 by ahlongxp, 16 years ago

Version: SVN

I'd really like to see this merged in django.

comment:18 by anonymous, 16 years ago

Cc: andrehcampos@… added

by mizatservercave, 16 years ago

updated for Django 1.0.2

comment:19 by mizatservercave, 16 years ago

Needs tests: set

I have added an updated patch that SEEMS to work on Django 1.0.2. I had an earlier revision working on Django 1.0.0, but have only updated my local copy to 1.0.2 tonight.

Extensive testing should be done. Right now I have only tested WSGI (via Django's manage.py runserver).

The exclusion/removal of the 'Content-Length' header has been added to force WSGI to recalculate the content length from the file length, where prior to that point in the code something previously tends to add 'Content-Length' = 0, which is probably not wise to mess with "at the source".

The additional content_type == None check was added for Windows compatibility -- Google Chrome would not download a file that ultimately resulted in "Content-Type: None", so I chose a generic 'reasonable' default. See also http://bugs.python.org/issue4969 -- but Python 2.5 without that fix is too commonly deployed, at least in my environment. It's easier to do a 2-line fix where it's needed, as-needed.

Other than that, the patch is essentially the same as the original poster's.

by mizatservercave, 16 years ago

Supersedes previous patches for Django 1.0.2 final.

comment:20 by mizatservercave, 16 years ago

New patch uploaded. Fixes mod_python support.

This patch is now used on a production website and shows phenomenal performance.

The production use-case for this patch is a "members-only" upload/download area which requires extensive permissions checking and custom auditing. This patch makes it possible to do all the logic needed while remaining completely within the Django framework, and without sacrificing performance while either uploading or downloading.

comment:21 by Jacob, 16 years ago

This looks like a nice patch to me, and hearing that it works in production is good to know. Still needs a few things, though:

  • The current setup requires cooperation between the HttpResponseSendFile object and the handler. This means it'll break if HttpResponseSendFile is used with a custom handler that doesn't know about it; that's a potential backwards-incompatibility, and we need to avoid it. If possible, HttpResponseSendFile needs to fake being a regular HttpResponse, probably by providing a content property which just reads the file.
  • Documentation.
  • Tests.

comment:22 by (none), 16 years ago

milestone: post-1.0

Milestone post-1.0 deleted

comment:23 by graham.carlyle@…, 16 years ago

Looked in to this a while back (I created the duplicate ticket #7894 with similar patch), but found that in production using the WSGI file_wrapper approach can be problematic. I found that serving PDFs to an adobe reader client failed for documents above around 25MB in size because reader required the web server to satisfy byte range requests which didn't work with WSGI. I ended up using the X-SendFile header and mod_xsendfile (on apache) which didn't have these problems. I'm not certain whether serving bytes range is inherently impossible with WSGI but requests to apache with mod_wsgi didn't satisfy them for the WSGI content.

Also you might like to make the same change that the patch in #7894 made to the 'django.views.static.serve' function so the django development server can use this speedup.

comment:24 by mizatservercave, 16 years ago

Implementing additional HTTP features (HTTP 206 PARTIAL CONTENT) directly in the core is something I suspect the core maintainers want to avoid, on grounds that Django is an application framework and *not* a webserver.

I think you bring up a good point, though. This patch serves a niche interest instead of a general purpose one. That is, instead of this patch being able to fully support all HTTP access methods (such as 206 PARTIAL CONTENT), it is useful in only very certain circumstances:

  1. When using a handler with an 'efficient mechanism' for 'sending files' AND which knows about HttpResponseSendFile as implemented by this patch, AND
  1. HttpResponseSendFile, as-implemented, is only suitable for sending whole files.

All the fallback methods revert to 'pure python iterators' in some manner or form. I spent some time bouncing between mod_python, the WSGI spec, and mod_python, trying to decide whether or not trying to wedge a generator into the works (for efficiency) was worthwhile overall. I don't think it is.

It appears that HttpResponseSendFile is only efficient when both the above preconditions are met. Anything which would cause any fallback method (iterators) to be used will DECREASE application performance, but probably not more than doing a raw "HttpResponse(file.read())". Any more 'advanced functionality' or efficiency requirements and you would be well advised to defer the functionality to the webserver, such as those servers which support X-Sendfile natively or by extension.

The goal of promoting this patch is to provide basic functionality just beyond the realm of python iterators for file-reading/file-sending (when preconditions for efficiency are met), while remaining entirely contained within Django -- without *additional* external dependencies, beyond normal deployment against mod_wsgi or mod_python. I do think the implementation useful, however. But it is certainly not "feature rich".

In regards to WSGI's support of all this, it appears that WSGI defers partial content delivery responsibility to the WSGI application [eg, Django or applications written with Django]. In other words, the WSGI spec does not appear support partial content directly. Reading a file and/or chopping it up to feed to the WSGI interface would require pure-python with (presumably) iterators/generators (or even tempfiles with the 'cut-out' content), which would entirely defeat the purpose of this patch.

Mod_python appears at first glance to expose sendfile 'directly' which would seem to support partial content easily, but there seems to be no straightforward way to be efficient across anything BUT Mod_python in this case, and is probably the subject of an additional patch to implement, say, HttpResponseResumableFile().

In short: This patch (unfortunately) serves only a small audience. The larger audience can probably be served with X-Sendfile.

[Kudo's for the reference. I didn't know X-Sendfile even existed 'til you posted, Graham. :)]

Advantages:

  • Provides amazingly good results, when used to send whole files across a mutually-supported interface

Disadvantages:

  • Introduces response-specific code into the handlers, which would ultimately require maintenance
  • All fallback methods are "not efficient" simply by virtue that "everything is done in pure python without a speedy extension, like mod_python or mod_wsgi provides". Large files consume a lot of time and a lot of RAM -- although attempting to serve those types of files through Django before this patch suffers from the same issues of slowness and lots of RAM consumed.
  • response.content is one of these inefficient methods

Additional Considerations:

  • As this is a 'send whole file' mechanism, should a 'Content-Disposition: attachment; filename=' header be added? Should the filename come from the HttpResponseSendFile constructor so the application can name it what it likes? [Personally, I think so, on both counts.]

by mizatservercave, 16 years ago

PARTIAL patch (FOR TESTING ONLY) to verify HttpResponseSendFile works without specific handler support

by mizatservercave, 16 years ago

Updated patch. Adds support for 'response.content'

comment:25 by mizatservercave, 16 years ago

Regarding the 'v1.2' version of the patch I just uploaded, note that for large files when accessed through response.content loads the whole thing up into the webserver's RAM before serving. A good test example is OpenOffice.org 3's 143MB installer. Bad for small (eg, 256MB RAM) webservers.

RAM also seems to 'hang around' after each request, so I suspect that .close() isn't being called properly on self._container, or _get_content needs to add a signal-hook to signals.request_finished to expire the content from RAM by deleting the variable or setting it to or somesuch.

That being said, accessing request.content is a compatibility property, and so shouldn't be used if at all possible.

The below test-case works for a raw, manual test, however, in addition to working with the "_test" patch above:

from django.http import HttpResponseSendFile
r = HttpResponseSendFile('./bigfile')
t = open('/tmp/test', 'wb')
t.write(r.content)
t.close()

# md5 and filesize of ./bigfile and /tmp/test match.

comment:26 by mizatservercave, 16 years ago

Reformatting the example since formatting got munged in the previous update.

from django.http import HttpResponseSendFile
r = HttpResponseSendFile('./bigfile')
t = open('/tmp/test', 'wb')
t.write(r.content)
t.close()

# md5 and filesize of ./bigfile and /tmp/test match.

comment:27 by Jacob, 16 years ago

Looks good. I'd like to see support for X-SendFile baked in. This should be too hard to do: instead of storing the file name in self.sendfile_filename, store it in an X-SendFile header. Then we just need a small piece of middleware -- or, perhaps, something added to the common middleware? -- that responds to a setting (SERVER_SUPPORTS_X_SENDFILE or somesuch) and returns the response with the header and an empty body.

comment:28 by Jacob, 16 years ago

milestone: 1.1 beta
Triage Stage: Design decision neededAccepted

comment:29 by Graham Dumpleton, 16 years ago

X-Sendfile, although understood by some servers/proxies isn't the only header used for this purpose. The perbal package has something called X-Reproxy-File which I believe acts the same way. The perbal package also has X-Reproxy-URL, which instead of serving up a file, reinjects the URL as a request into the request handling system. This means the request gets pushed back to a backend system to serve. Thus it might get bumped off to a static media server which then serves up the matching file. The request may also be handled by a dynamic web application. In nginx, there is a similar thing to X-Reproxy-URL called X-Accel-Redirect. Finally both these are in some respects similar to the CGI internal redirection using Location header and 200 response which Apache CGI supports and which other servers provding CGI should also support. This CGI mechanism of using Location header will also supported for mod_wsgi daemon mode in mod_wsgi 3.0.

Have been looking at X-Sendfile support for mod_wsgi daemon mode in mod_wsgi 3.0, but the problem with it is that it side steps standard Apache permissions checks on files. Yes, an application could just read the file itself and return it anyway, but the difference is that in daemon mode the user can be different to that Apache runs as, thus X-Sendfile would be handled as the Apache user and not the daemon process user. Thus access may not be available to the file, or access may be available to files that the daemon process shouldn't otherwise have access to. This is why mod_wsgi 3.0 at this point will only support internal URL redirection via Location header as per CGI specification. Namely, the internal redirection has all the Apache permission checks applied.

The only issue with things like X-Accel-Redirect and CGI Location redirection is that the URL has to be mapped to a file and the presence of a mapping would normally mean an outside party could access the resource directly. In nginx they get around this through flagging the URL as private and so only a candidate for use by internal redirects. In the case of Apache, you would need to use mod_rewrite to enforce a condition that URL is forbidden if not a sub request made within the server itself.

Anyway, it may be felt that this is all irrelevant, but want to at least point out that X-Sendfile isn't the only header name used and so you may want to make it configurable. The URL redirection mechanisms available might still also be employed if header can be overridden, but may be a bit tricky here if code can force the file to be loaded inside Django application before response gets sent, with it turning into a byte stream instead. The problem here is that for X-Reprox-URL, it may actually refer to something on a different host if they allow a 'http://host' component in the URL, thus can't sensibly open it as a file. For CGI Location header, no 'http://host', but it is still a logical URL and can't be opened as a file. But then, if X-Sendfile or X-Reproxy-File are used and the front end is on a different system, the Django application may not even be able to open the path anyway as file system view could be different.

So, in respect of the latter, with the X-Sendfile approach, is it always the case that it always makes it out as a X-Sendfile response, or can the Django application, because of some sort of output filter or similar, force the path to be resolved in the application with data read at that point.

Oh and finally, for X-Sendfile, does the response sent always have a 200 HTTP status code? I know that is what CGI Location wants, but not sure about all the others. If the response codes aren't consistent, then even if can override header name, may not work for all.

comment:30 by Graham Dumpleton, 16 years ago

BTW, mod_wsgi 2.0 through 2.3 has an issue that can occur under certain conditions, with a file returned using wsgi.file_wrapper being truncated when size is > 255 and < ~8000 bytes. This is due to an obscure Apache optimisation. The issue will be fixed in mod_wsgi 2.4. Details, with a description of fix, can be found at:

http://code.google.com/p/modwsgi/issues/detail?id=132

One also should not use mod_wsgi 2.0 through 2.1, which can also truncate a file returned using wsgi.file_wrapper if it contains an embedded null character.


comment:31 by gregoire, 16 years ago

Cc: gregoire@… added

The patch does not work with GZIP middleware.

Using mod_wsgi, I get this error: "IOError: client connection closed"

comment:32 by mrts, 16 years ago

Needs documentation: set
Patch needs improvement: set

Questions:

  1. Why to you first do an expensive call to getsize in self['Content-Length'] = os.path.getsize(path_to_file) and later discard it in if hasattr(response, 'sendfile_filename') and response.has_header('Content-Length'): del response['Content-Length'] in WSGI handler? I didn't find anything in PEP 333 that would suggest this is needed. Please correct me or remove that bit.
  2. Although this is mostly a conceptual/taste matter, wouldn't it be more clear to use isinstance(response, HttpResponseSendFile) instead of hasattr(response, 'sendfile_filename') to discern HttpResponseSendFile objects?

Nitpicking HttpResponseSendFile.__init__:

  1. Comparisons to singletons like None should always be done with 'is' or 'is not', never the equality operators. -- PEP 8, thus content_type == None should be content_type is None instead.
  2. Opening the file needs to be consistent. You open it in the constructor and later reopen again in handlers, so the file is opened twice.
  3. The logic of parent constructor should be reused, I suggest the following instead:
    class HttpResponseSendFile(HttpResponse):
        def __init__(self, path_to_file, content_type=None, blocksize=8192):
            if not content_type: # or even if content_type is None
                from mimetypes import guess_type
                content_type = guess_type(path_to_file)[0]
                if content_type is None: # probably windows w/o proper mimetype lookup
                    content_type = "application/octet-stream"
            container = open(path_to_file) # be consistent, either open() here and don't reopen in handlers or don't open() here
            super(HttpResponseSendFile, self).__init__(content=container, content_type=content_type)
            self.sendfile_filename = path_to_file
            self.block_size = blocksize
            # compatibility; fake being a regular HttpResponse if needed
            self._headers['content-length'] = ('Content-Length', os.path.getsize(path_to_file))
    

Looks like more thought has to be given to the general context (basehttp.py etc). Docs and tests are missing as well.

Otherwise, nice work!

comment:33 by mrts, 16 years ago

As far as I can see, HttpResponseSendFile should not mimic HttpResponse in the sense that it should always be specially handled. Ordinary HttpResponse can take a file object in the content parameter already and serve the file from within Python, so that behaviour should be explicitly disallowed in HttpResponseSendFile.

Attaching a patch that does exactly this, i.e. doesn't open() the file and passes None as content to parent constructor to make it explicit that the content is neither a string nor can be handled with the default behaviour. I'm not sure about the Content-Length header though, I've currently added it to be compatible with ServerHandler.cleanup_headers(), but it may need more thought.

Btw, how mod_python handles req.sendfile can be seen here: https://svn.apache.org/repos/asf/quetzalcoatl/mod_python/trunk/src/requestobject.c .

by mrts, 16 years ago

Disallow default HttpResponse behaviour (content etc)

in reply to:  25 comment:34 by mrts, 16 years ago

RAM also seems to 'hang around' after each request, so I suspect that .close() isn't being called properly on self._container, or _get_content needs to add a signal-hook to signals.request_finished to expire the content from RAM by deleting the variable or setting it to or somesuch.

There's a slight possibility that the cause may be return iter(lambda: filelike.read(block_size), '') that was used in the previous patch if. There's no close() in the resulting iterator. I used the FileWrapper from basehttp.py instead, please try if this changes things.

comment:35 by mrts, 16 years ago

As for middleware, they can explicitly bypass HttpResponseSendFile responses as follows:

    def process_response(self, request, response):
        if isinstance(response, HttpResponseSendFile):
            return response

or the handlers can bypass HttpResponseSendFile implicitly in __call__:

            if not isinstance(response, HttpResponseSendFile):
                for middleware_method in self._response_middleware:
                    response = middleware_method(request, response)

by mrts, 16 years ago

Fix method naming typo

by mrts, 16 years ago

Doh, previous diff didn't contain the fix, second try.

by mrts, 16 years ago

Fix yet another copy-paste omission.

comment:37 by Graham Dumpleton, 16 years ago

Cc: Graham.Dumpleton@… added

Can someone indicate whether the intention is, or have these changes supported X-Sendfile as mentioned? I looked through some of the multitude of diffs, but couldn't see it mentioned.

comment:38 by Jacob, 16 years ago

Graham: Yes, I'd expect to see x-sendfile support before I check this in.

mrts: I explicitly asked asked that HttpResponseSendFile mimic standard HttpResponse semantics. This is a backwards-compatibility issue: writing custom handlers is an advanced but support feature.

comment:39 by Graham Dumpleton, 16 years ago

Jacob: Okay. Only asked as wanted to make sure that the name of header used was going to be somehow configurable to allow for those header names used by other things like Apache/nginx and perlbal and not be X-Sendfile specific. Thanks.

comment:40 by Jacob, 16 years ago

That should be pretty easy to pull off, I think.

in reply to:  38 comment:41 by anonymous, 16 years ago

Replying to jacob:

mrts: I explicitly asked asked that HttpResponseSendFile mimic standard HttpResponse semantics. This is a backwards-compatibility issue: writing custom handlers is an advanced but support feature.

OK, I'll try to look into it in the evening and provide something for X-Sendfile as well.

comment:42 by mrts, 16 years ago

(It was me posting the last comment)

Note that I don't like the HttpResponse compatible solution.

Concerns:

  • open() in __init__() will remain messy, if only because of mod_python that doesn't accept a file object in sendfile(); bluntly consuming two file descriptors for one response even though one isn't needed does not look sane to me,
  • we can open() lazily, but then we have to introduce state tracking (file_is_open), resulting in more complex and less robust code.

Is it wise to go to such lengths to cater for minor backwards-compatibility concerns? Jacob, which approach do you prefer?

comment:43 by Jacob, 16 years ago

Neither of those is necessary. You just need to make content into a property which lazily opens and reads the file.

in reply to:  43 comment:44 by mrts, 16 years ago

Replying to jacob:

Neither of those is necessary. You just need to make content into a property which lazily opens and reads the file.

That's what I meant by "lazily". However, state tracking is still required as the file has to be explicitly closed in HttpResponseSendFile.close(). This can not be done in content due to __iter__. And the file may not be open() at all if content is never touched. Thus, state tracking seems unavoidable.

comment:45 by Gábor Farkas, 16 years ago

Cc: gabor@… added

by mrts, 16 years ago

Added the requested header, as well as docs and tests.

comment:46 by mrts, 16 years ago

Added the configurable X-Sendfile header and preliminary docs and tests -- mainly to ease work for others who like to drive this further.

However, as of now the patch is still incomplete. I personally don't like providing any content for the reasons elaborated in http://groups.google.com/group/django-developers/browse_thread/thread/82367cf18eedfe89. I'm not the only one, see e.g. http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/.

Sadly, it looks like this will probably not make it into 1.1 as I likely will not have time for this in the weekend.

comment:47 by mrts, 16 years ago

Another issue: the disposition header should perhaps be configurable with a constructor argument (inline is needed to display files directly in browser).

comment:48 by Malcolm Tredinnick, 16 years ago

milestone: 1.1 beta1.2

Still too many edge-cases not being accounted for here (basically handling all the stuff Graham mentioned for a start). It's not ready for 1.1.

mrts: The final solution will have to fall-back gracefully if the user's webserver doesn't have any direct sendfile support, since otherwise you can't use this feature in a redistributable application (the application writer doesn't know how it will be installed). The client browser shouldn't ever receive an empty file. That's what Jacob meant by backwards-compatible, he just used an ambiguous phrase.

in reply to:  48 comment:49 by mrts, 16 years ago

Replying to mtredinnick:

mrts: The final solution will have to fall-back gracefully if the user's webserver doesn't have any direct sendfile support, since otherwise you can't use this feature in a redistributable application (the application writer doesn't know how it will be installed). The client browser shouldn't ever receive an empty file. That's what Jacob meant by backwards-compatible, he just used an ambiguous phrase.

The point is valid. However, looking at it from deployer's viewpoint I would always want to know if the efficient method is used or not (finding it out via the OS utilities is really cumbersome). As of now I'll add another default setting, SENDFILE_FALLBACK_ALLOWED = True and define content with:

def _get_content(self):
    if not settings.SENDFILE_FALLBACK_ALLOWED:
        raise RuntimeError("Fall-back to serving files via Python not allowed by configuration in HttpResponseSendFile.")
    else:
        # open the file and return its content

This will be just a proof of concept and I abhor adding yet another setting as much as you probably do, but I can't find a better way. Feel free to suggest one.

As this is not urgent, being targeted to 1.2, I can't yet tell when I update the patch.

comment:50 by Ville Säävuori, 16 years ago

Cc: ville@… added

comment:51 by hackeron, 16 years ago

Cc: hackeron@… added

comment:52 by ccahoon, 15 years ago

Cc: chris.cahoon@… added
Owner: changed from Adrian Holovaty to ccahoon

comment:53 by ccahoon, 15 years ago

(In [11131]) [soc2009/http-wsgi-improvements] Initial HttpResponseSendFile support, changes pulled from 03/21/09 patch on refs #2131.

This does not pass the included regression tests. However, since this feature
will be entirely based on these changes, which have already gone through a
great number of iterations, I thought it would be sensible to start here.

All of the work here is ymasuda, mizatservercave, and mrts (apologies if I
missed anyone). I hope to take their work
down the final stretch.

comment:54 by ccahoon, 15 years ago

I have made changes for HttpResponseSendFile support, in r11266 and earlier commits. I am still getting used to putting the refs in my commit messages, but when the changes get merged with trunk it will be there. If you are interested, let me know what you think.

in reply to:  54 comment:55 by ccahoon, 15 years ago

Replying to ccahoon:

I have made changes for HttpResponseSendFile support, in r11266 and earlier commits. I am still getting used to putting the refs in my commit messages, but when the changes get merged with trunk it will be there. If you are interested, let me know what you think.

Correction: r11268. Also, r11212 and r11210. Sorry about that.

comment:56 by ccahoon, 15 years ago

(In [11319]) [soc2009/http-wsgi-improvements] Changes for HttpResponseSendFile support in FastCGI. Refs #2131.

comment:57 by ccahoon, 15 years ago

(In [11320]) [soc2009/http-wsgi-improvements] Fix early settings use in HttpResponseSendFile. Refs #2131.

comment:58 by ccahoon, 15 years ago

(In [11325]) [soc2009/http-wsgi-improvements] Add a how-to for using HttpResponseSendFile with various server arrangements. Refs #2131.

comment:59 by ccahoon, 15 years ago

(In [11326]) [soc2009/http-wsgi-improvements] Remove setting the Content-Length header for HttpResponseSendFile from the handler, for compatibility, and add a content attribute. Refs #2131.

Also adds a _charset class attribute to HttpResponse so the children all have it.

comment:60 by ccahoon, 15 years ago

(In [11327]) [soc2009/http-wsgi-improvements] Change the setting name to SENDFILE_HEADER and update docs. Refs #2131.

comment:61 by ccahoon, 15 years ago

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
Triage Stage: AcceptedFixed on a branch

comment:62 by ccahoon, 15 years ago

Fixed on branches/soc2009/http-wsgi-improvements.

comment:63 by andreaskundig, 15 years ago

Summary: [patch] HttpResponseSendFile for serving static files handler-specific sendfile mechanismit breaks i18n on django 1.1

I'm not sure this is the right place to signal a bug, but HttpResponseSendFile breaks internationalization.

I have a page where the user can change his language as described in the documentation:
http://docs.djangoproject.com/en/dev/topics/i18n/#the-set-language-redirect-view

On my page there is a link to a view that sends a pdf with HttpResponseSendFile. If I click on this link I receive the pdf, but afterwards I no longer can change the language. If I was on an english page and then submit a language change to french, the /i18n/setlang/ view recognizes the correct language code 'fr' and puts it into the session, but once the request arrives at my view request.LANGUAGE_CODE is 'en'.

It works again after I restart Django.

(I am using Django 1.1 with python 2.5.4 on windows)

in reply to:  63 comment:64 by andreaskundig, 15 years ago

Bypassing the middleware in base.py that breaks i18n. When I comment that out it works again.

comment:65 by Karen Tracey, 15 years ago

Summary: it breaks i18n on django 1.1HttpResponseSendFile for serving static files handler-specific sendfile mechanism

Reverting confusing change to ticket summary.

comment:66 by ccahoon, 15 years ago

Which patch is it that breaks i18n? Are you using the one from the SOC branch?

in reply to:  66 comment:67 by andreaskundig, 15 years ago

Replying to ccahoon:

Which patch is it that breaks i18n? Are you using the one from the SOC branch?

I'm using the latest patch (added by mrts on 03/21/09):
http://code.djangoproject.com/attachment/ticket/2131/httpresponsesendfile-no-default-content_bypass-middleware_with-header_with-docs-and-tests.diff

I don't know what the SOC branch is.

comment:68 by AntonBessonov, 15 years ago

Cc: exelib@… added

comment:69 by redalastor, 15 years ago

Cc: redalastor@… added

comment:70 by springmeyer, 15 years ago

Cc: dane.springmeyer@… added

comment:71 by Jannis Leidel, 15 years ago

Cc: Jannis Leidel added

comment:72 by Dan Ros, 15 years ago

Cc: Dan Ros added

comment:73 by Dan Ros, 15 years ago

Cc: dan@… added; Dan Ros removed

comment:74 by danny_adair, 15 years ago

Cc: danny.adair@… added

comment:75 by anonymous, 15 years ago

Cc: Florian Apolloner added

comment:76 by Jari Pennanen, 15 years ago

Cc: Jari Pennanen added

comment:77 by mrts, 15 years ago

Note that you don't need this fix to use efficient file serving from files accessible from the filesystem.

Just use an empty response and set the X-Sendfile header manually if using Apache. If not, check your server documentation for the right header name.

E.g.:

...
response = HttpResponse()
response['X-Sendfile'] = '/full/path/to/file'
response['Content-Type'] = 'some/content_type'
response['Content-Length'] = os.stat('/full/path/to/file').st_size
response['Content-Disposition'] = 'attachment; filename="foo"'
return response

comment:78 by Thomas Güttler, 15 years ago

Cc: hv@… added

comment:79 by James Bennett, 15 years ago

milestone: 1.2

1.2 is feature-frozen; bumping this feature request off the milestone.

comment:80 by Torsten Bronger, 15 years ago

Cc: bronger@… added

comment:81 by David Larlet, 15 years ago

Cc: David Larlet added

comment:82 by Mitar, 14 years ago

Cc: mmitar@… added

comment:83 by JivanAmara, 14 years ago

Cc: jivan@… added

comment:84 by anonymous, 14 years ago

Cc: darkrho@… added

comment:85 by Chris Beaven, 14 years ago

Patch needs improvement: set
Triage Stage: Fixed on a branchAccepted

I'm going to put the stage back to Accepted, since the branch this is fixed on is rather stale. It'd be good to see a stand-alone patch here updated with any changes made in the branch if someone wants a task.

comment:86 by Ryan Witt, 14 years ago

Cc: onecreativenerd@… added

in reply to:  27 comment:87 by anonymous, 14 years ago

Replying to jacob:

Looks good. I'd like to see support for X-SendFile baked in. This should be too hard to do: instead of storing the file name in self.sendfile_filename, store it in an X-SendFile header. Then we just need a small piece of middleware -- or, perhaps, something added to the common middleware? -- that responds to a setting (SERVER_SUPPORTS_X_SENDFILE or somesuch) and returns the response with the header and an empty body.

Testing the current patch, the "baked in" x-sendfile support breaks the feature. I see both wsgi and x-sendfile messages in the log, for the same file. I suspect something like the following: due to the header, apache discards the returned content (the file wsgi is supposed to be serving), and tries to serve the file itself via mod_xsendfile. In my config the wsgi daemon has permission to read the file, but the "apache" user does not, so xsendfile fails. If your user "apache" has permission to read the file, you probably won't see the bug, but behind the scenes the file is being read twice (discarded the first time).

Disabling mod_xsendfile allows it to work. It's worth reiterating that x-sendfile does not work if the main apache user does not have permission to read the file. A solution is sorely needed for serving files in the context of wsgi.

comment:88 by Łukasz Rekucki, 14 years ago

Severity: normalNormal
Type: enhancementNew feature

comment:89 by Thomas Güttler, 13 years ago

Easy pickings: unset
UI/UX: unset

SendFile support in django would be nice. But I am happy this now: https://github.com/johnsensible/django-sendfile

This is a wrapper around web-server specific methods for sending files to web clients.
This is useful when Django needs to check permissions associated files, but does not want
to serve the actual bytes of the file itself. i.e. as serving large files is not what Django is made for.

comment:90 by Thomas Güttler, 13 years ago

Resolution: wontfix
Status: reopenedclosed

I asked on django-dev about this ticket, there was only one reply. But the core developers don't care.

http://groups.google.com/group/django-developers/browse_thread/thread/b02c19daf0af5602/1bea6744f9e2052a?pli=1

I close this ticket. You have got to use the external django-sendfile project.

comment:91 by Jannis Leidel, 13 years ago

Component: Core (Other)HTTP handling
Needs documentation: set
Resolution: wontfix
Status: closedreopened

A core developer clearly has marked this ticket as accepted above and non of the core devs have argued against it on the mailing list. So in other words, this just needs someone to spear head the porting of the changes made to the soc2009/http-wsgi-improvements branch.

by milosu, 13 years ago

Attachment: ticket2131.diff added

updated patch compatible with Django-1.4b1 including test case

comment:92 by milosu, 13 years ago

Cc: milos.urbanek@… added
Needs documentation: unset

comment:93 by André Cruz, 12 years ago

Cc: andre.cruz@… added

So, now that there is an updated patch and some core developers interested, can we expect this patch to be included in the next Django version?

comment:94 by Aymeric Augustin, 12 years ago

Owner: changed from ccahoon to Aymeric Augustin
Status: reopenednew

comment:95 by Aymeric Augustin, 12 years ago

I see people struggling with private static files all the time, and I've implemented several variants of this myself. This feature is needed.

However, I'm not sure that merging django-sendfile as a contrib app is the best option. There've been several discussions about endorsing third-party applications covering common needs. If we started doing that, we could just point to https://github.com/johnsensible/django-sendfile in the documentation.

EDIT: Alex noticed some dubious code. We can't endorse that.

NB: since #7581 was fixed, serving files through django.views.static.serve or with a StreamingHttpResponse is less inefficient (although still not recommended): the entire file content isn't loaded in memory.

NB: the latest patch won't work with nginx because X-Accel-Redirect requires a relative path. Also it still contains changes for mod_python which was removed from master.

Last edited 12 years ago by Aymeric Augustin (previous) (diff)

comment:96 by Aymeric Augustin, 12 years ago

Owner: changed from Aymeric Augustin to nobody

comment:97 by Benoît Bryon, 12 years ago

Just released http://pypi.python.org/pypi/django-downloadview, which is a third-party tool that provides class-based download views. Support of nginx's X-Accel is implemented with a middleware (global) or with decorators (per-view configuration).

I didn't know django-sendfile before I discovered this ticket... I'm to look at it ;)

comment:98 by Simon Law, 12 years ago

Cc: sfllaw@… added

comment:99 by chris.cahoon@…, 12 years ago

Cc: chris.cahoon@… removed

comment:100 by Benoît Bryon, 11 years ago

Just released an update of django-downloadview, and came back here to check the status of the ticket...

One year ago, aaugustin said:

I'm not sure that merging django-sendfile as a contrib app is the best option. There've been several discussions about endorsing third-party applications covering common needs. If we started doing that, we could just point to ​https://github.com/johnsensible/django-sendfile in the documentation.

But this ticket is still tagged as "accepted"... is it still valid? I mean, would such a feature be included in Django's core one day?

In the meantime, we keep on developing third-party projects (by the way, I discovered a new one called "django-transfer" recently).

If such a feature is to be added to Django's core, may you consider django-dowloadview's mechanisms?

  • Generic views cover commons patterns, depending on the file (FileField in Model, Storage, path, URL, text, StringIO, generator...)
  • Generic views and mixins are easy to override to implement specific use cases
  • Views return some "FileStreamingResponse" (StreamingHttpResponse subclass)
  • FileStreamingResponse carries a file wrapper
  • File wrappers describe files. FieldFile, File, ContentFile are valid file wrappers. Some other wrappers may be added to support storages, URLs or generators.
  • Response middlewares (or decorators) are given the opportunity to capture the FileStreamingResponse and convert them to some optimized response (Sendfile, XAccel, ...) before file content is read from disk/network or loaded into memory.
  • Test utilities help developers check responses or views.

See http://django-downloadview.readthedocs.org/en/1.3/overview.html for details.

comment:101 by Aymeric Augustin, 11 years ago

Resolution: wontfix
Status: newclosed

Benoît, thanks for this update. The scope covered by django-downloadview is far from trivial. Thinking about this again one year later, I believe it's better left to third party apps.

This ticket has grown long over the years and it's very hard to tell what it's proposing exactly after all these discussions. I'm going to close it.

If some simple changes to Django would make the implementation of django-downloadview easier, let us know, either on the django-developers mailing list or in a new ticket.

If someone wants to propose a simple and straightforward API to handle the common case in Django, please send a concrete proposal to django-developers. (This could stifle third-party apps.)

Note: See TracTickets for help on using tickets.
Back to Top