Code

Opened 8 years ago

Closed 5 months ago

#2131 closed New feature (wontfix)

HttpResponseSendFile for serving static files handler-specific sendfile mechanism

Reported by: ymasuda[at]ethercube.com Owned by: nobody
Component: HTTP handling Version: master
Severity: Normal Keywords:
Cc: Graham.Dumpleton@…, jdunck@…, django-ticket-2131@…, andreas@…, andrehcampos@…, gregoire@…, gabor@…, ville@…, hackeron@…, exelib@…, redalastor@…, dane.springmeyer@…, jezdez, dan@…, danny.adair@…, apollo13, Ciantic, hv@…, bronger@…, david, mmitar@…, jivan@…, darkrho@…, onecreativenerd@…, milos.urbanek@…, andre.cruz@…, sfllaw@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

This patch modifies modpython handler and WSGI (basehttp) handler to add HttpResponseSendFile.
The HttpResponseSendFile allows static files to be served using handler-specific sendfile mechanism.
It uses response.sendfile() on mod_python, FileWrapper on wsgi.

Attachments (13)

http-response-send-file.diff (3.0 KB) - added by ymasuda[at]ethercube.com 8 years ago.
diff implements HttpResponseSendFile
http-response-send-file.2.diff (2.9 KB) - added by ymasuda[at]ethercube.com 8 years ago.
fixed typos above, sorry
django-ticket-2131-http-response-send-file.2-django1.0.2final.diff (3.4 KB) - added by mizatservercave 5 years ago.
updated for Django 1.0.2
django-ticket-2131-http-response-send-file.2-django1.0.2final-v1.1.diff (3.4 KB) - added by mizatservercave 5 years ago.
Supersedes previous patches for Django 1.0.2 final.
django-ticket-2131-http-response-send-file.2-django1.0.2final-v1.2_test.diff (1.9 KB) - added by mizatservercave 5 years ago.
PARTIAL patch (FOR TESTING ONLY) to verify HttpResponseSendFile works without specific handler support
django-ticket-2131-http-response-send-file.2-django1.0.2final-v1.2.diff (3.6 KB) - added by mizatservercave 5 years ago.
Updated patch. Adds support for 'response.content'
httpresponsesendfile-no-default-content.diff (3.7 KB) - added by mrts 5 years ago.
Disallow default HttpResponse behaviour (content etc)
httpresponsesendfile-no-default-content_bypass-middleware.diff (7.0 KB) - added by mrts 5 years ago.
Bypass middleware
httpresponsesendfile-no-default-content_bypass-middleware.1.diff (7.0 KB) - added by mrts 5 years ago.
Fix method naming typo
httpresponsesendfile-no-default-content_bypass-middleware.2.diff (7.0 KB) - added by mrts 5 years ago.
Doh, previous diff didn't contain the fix, second try.
httpresponsesendfile-no-default-content_bypass-middleware.3.diff (7.0 KB) - added by mrts 5 years ago.
Fix yet another copy-paste omission.
httpresponsesendfile-no-default-content_bypass-middleware_with-header_with-docs-and-tests.diff (12.3 KB) - added by mrts 5 years ago.
Added the requested header, as well as docs and tests.
ticket2131.diff (13.6 KB) - added by milosu 2 years ago.
updated patch compatible with Django-1.4b1 including test case

Download all attachments as: .zip

Change History (113)

Changed 8 years ago by ymasuda[at]ethercube.com

diff implements HttpResponseSendFile

comment:1 Changed 8 years ago by anonymous

  • Summary changed from HttpResponseSendFile for serving static files handler-specific sendfile mechanism to [patch] HttpResponseSendFile for serving static files handler-specific sendfile mechanism

Changed 8 years ago by ymasuda[at]ethercube.com

fixed typos above, sorry

comment:2 follow-up: Changed 8 years ago by adrian

  • Resolution set to wontfix
  • Status changed from new to closed

Django isn't meant to serve static files, so I'm marking this as a wontfix.

comment:3 Changed 7 years ago by alex@…

This patch could be very useful in a situations with a password protected files to a specific users. Typically I am talking about e-commerce site which sells PDF books. I am currently returning data via standard HttpResponse. It works well for small files,
but it would probably be a problem for bigger files.

comment:4 Changed 7 years ago by Stefan (stelund on gmail)

I've implemented a similar workaround to restrict photo images accesses. I think its ridiculous not to implement this feature to aid performance for views.static.serve() as well. HttpResponse is implemented as an iterator so it fits perfectly to read the file chunk by chunk. The performance increase i've seen is great and i'm happy not to need another tcp-server running on my box.

comment:5 in reply to: ↑ 2 ; follow-up: Changed 7 years ago by johann@…

Replying to adrian:

Django isn't meant to serve static files, so I'm marking this as a wontfix.

But this feature would be useful to reduce memory usage with large dynamically generated content.

In my application, I generate large (30MB) ZIP files in Django and send them through response.write(). The ZIP files are just collections of PNG files, so it doesn't make sense to store them on disk permanently.

Unfortunately, using response.write() means that the whole ZIP file must be loaded into memory. It would be so cool if I could create the file on disk and then use sendfile. The same use-case applies to generated PDF files.

As a workaround, I could write the file to disk and then redirect the browser, but that's tedious, requires another TCP round-trip, and makes automated downloads more difficult.

comment:6 in reply to: ↑ 5 Changed 7 years ago by johann@…

Here's a solution that doesn't require patching Django. It's not the real sendfile(), but it loads and sends the file in 8KB chunks, rather than loading the whole file into memory.

http://www.djangosnippets.org/snippets/365/

It uses the FileWrapper object from source:django/trunk/django/core/servers/basehttp.py.

comment:7 Changed 6 years ago by TP

Even using a file-wrapper iterator does not speed up things as much as directly using sendfile. Using mod_wsgi's exposed sendfile uses 30x less CPU vs Django-with-iterator on my linux box testing downloading hundreds of images concurrently.

comment:8 Changed 6 years ago by grahamd

Trac also isn't meant to serve static media files either and preferred setup is to use Apache to do it directly. In Trac however you have the concept of attachments associated with tickets. For such a thing as attachments, where it is still stored in file system, but where Apache cant get to it, you have no choice but to serve it through the application. Any similar sort of scenario in Django is currently not very efficient in terms of performance, or in terms of memory usage if people don't use techniques to limit memory impact. Thus, Django should really provide a default way of handling this efficiently and saying 'wontfix' means that you haven't looked very widely at how Django can be used. :-)

comment:9 Changed 6 years ago by jdunck

  • Cc jdunck@… added

comment:10 Changed 6 years ago by mtredinnick

  • milestone set to post-1.0
  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Unreviewed to Design decision needed

Claiming we "haven't looked very widely" is ridiculous, given what Django can do. More possibly "some cases may have been overlooked" (there are alternatives to the Trac setup). So let's try to avoid hyperbole. :-)

Reopening for now and we can look at it after 1.0.

comment:11 Changed 6 years ago by mrts

+1 on including this. My use case: instrument Java classfiles, generate a JAR and return it to the user on the fly.

comment:12 Changed 6 years ago by graham.carlyle@…

Another ticket that covers this issue is #7984, which probably means that its a duplicate. Sorry I didn't spot this ticket when I filed it. The patch on #7984 is very similar to this one :) but has some extra logic related to content-length for mod_wsgi because of some amibguities in the WSGI spec. (see discussion on mod_wsgi group)

comment:13 Changed 6 years ago by graham.carlyle@…

gah i'm an idiot I meant #7894

comment:14 Changed 6 years ago by Miz

  • Cc django-ticket-2131@… added

comment:15 Changed 6 years ago by anonymous

  • Cc andreas@… added

comment:17 Changed 5 years ago by ahlongxp

  • Version set to SVN

I'd really like to see this merged in django.

comment:18 Changed 5 years ago by anonymous

  • Cc andrehcampos@… added

Changed 5 years ago by mizatservercave

updated for Django 1.0.2

comment:19 Changed 5 years ago by mizatservercave

  • Needs tests set

I have added an updated patch that SEEMS to work on Django 1.0.2. I had an earlier revision working on Django 1.0.0, but have only updated my local copy to 1.0.2 tonight.

Extensive testing should be done. Right now I have only tested WSGI (via Django's manage.py runserver).

The exclusion/removal of the 'Content-Length' header has been added to force WSGI to recalculate the content length from the file length, where prior to that point in the code something previously tends to add 'Content-Length' = 0, which is probably not wise to mess with "at the source".

The additional content_type == None check was added for Windows compatibility -- Google Chrome would not download a file that ultimately resulted in "Content-Type: None", so I chose a generic 'reasonable' default. See also http://bugs.python.org/issue4969 -- but Python 2.5 without that fix is too commonly deployed, at least in my environment. It's easier to do a 2-line fix where it's needed, as-needed.

Other than that, the patch is essentially the same as the original poster's.

Changed 5 years ago by mizatservercave

Supersedes previous patches for Django 1.0.2 final.

comment:20 Changed 5 years ago by mizatservercave

New patch uploaded. Fixes mod_python support.

This patch is now used on a production website and shows phenomenal performance.

The production use-case for this patch is a "members-only" upload/download area which requires extensive permissions checking and custom auditing. This patch makes it possible to do all the logic needed while remaining completely within the Django framework, and without sacrificing performance while either uploading or downloading.

comment:21 Changed 5 years ago by jacob

This looks like a nice patch to me, and hearing that it works in production is good to know. Still needs a few things, though:

  • The current setup requires cooperation between the HttpResponseSendFile object and the handler. This means it'll break if HttpResponseSendFile is used with a custom handler that doesn't know about it; that's a potential backwards-incompatibility, and we need to avoid it. If possible, HttpResponseSendFile needs to fake being a regular HttpResponse, probably by providing a content property which just reads the file.
  • Documentation.
  • Tests.

comment:22 Changed 5 years ago by anonymous

  • milestone post-1.0 deleted

Milestone post-1.0 deleted

comment:23 Changed 5 years ago by graham.carlyle@…

Looked in to this a while back (I created the duplicate ticket #7894 with similar patch), but found that in production using the WSGI file_wrapper approach can be problematic. I found that serving PDFs to an adobe reader client failed for documents above around 25MB in size because reader required the web server to satisfy byte range requests which didn't work with WSGI. I ended up using the X-SendFile header and mod_xsendfile (on apache) which didn't have these problems. I'm not certain whether serving bytes range is inherently impossible with WSGI but requests to apache with mod_wsgi didn't satisfy them for the WSGI content.

Also you might like to make the same change that the patch in #7894 made to the 'django.views.static.serve' function so the django development server can use this speedup.

comment:24 Changed 5 years ago by mizatservercave

Implementing additional HTTP features (HTTP 206 PARTIAL CONTENT) directly in the core is something I suspect the core maintainers want to avoid, on grounds that Django is an application framework and *not* a webserver.

I think you bring up a good point, though. This patch serves a niche interest instead of a general purpose one. That is, instead of this patch being able to fully support all HTTP access methods (such as 206 PARTIAL CONTENT), it is useful in only very certain circumstances:

  1. When using a handler with an 'efficient mechanism' for 'sending files' AND which knows about HttpResponseSendFile as implemented by this patch, AND
  1. HttpResponseSendFile, as-implemented, is only suitable for sending whole files.

All the fallback methods revert to 'pure python iterators' in some manner or form. I spent some time bouncing between mod_python, the WSGI spec, and mod_python, trying to decide whether or not trying to wedge a generator into the works (for efficiency) was worthwhile overall. I don't think it is.

It appears that HttpResponseSendFile is only efficient when both the above preconditions are met. Anything which would cause any fallback method (iterators) to be used will DECREASE application performance, but probably not more than doing a raw "HttpResponse(file.read())". Any more 'advanced functionality' or efficiency requirements and you would be well advised to defer the functionality to the webserver, such as those servers which support X-Sendfile natively or by extension.

The goal of promoting this patch is to provide basic functionality just beyond the realm of python iterators for file-reading/file-sending (when preconditions for efficiency are met), while remaining entirely contained within Django -- without *additional* external dependencies, beyond normal deployment against mod_wsgi or mod_python. I do think the implementation useful, however. But it is certainly not "feature rich".

In regards to WSGI's support of all this, it appears that WSGI defers partial content delivery responsibility to the WSGI application [eg, Django or applications written with Django]. In other words, the WSGI spec does not appear support partial content directly. Reading a file and/or chopping it up to feed to the WSGI interface would require pure-python with (presumably) iterators/generators (or even tempfiles with the 'cut-out' content), which would entirely defeat the purpose of this patch.

Mod_python appears at first glance to expose sendfile 'directly' which would seem to support partial content easily, but there seems to be no straightforward way to be efficient across anything BUT Mod_python in this case, and is probably the subject of an additional patch to implement, say, HttpResponseResumableFile().

In short: This patch (unfortunately) serves only a small audience. The larger audience can probably be served with X-Sendfile.

[Kudo's for the reference. I didn't know X-Sendfile even existed 'til you posted, Graham. :)]

Advantages:

  • Provides amazingly good results, when used to send whole files across a mutually-supported interface

Disadvantages:

  • Introduces response-specific code into the handlers, which would ultimately require maintenance
  • All fallback methods are "not efficient" simply by virtue that "everything is done in pure python without a speedy extension, like mod_python or mod_wsgi provides". Large files consume a lot of time and a lot of RAM -- although attempting to serve those types of files through Django before this patch suffers from the same issues of slowness and lots of RAM consumed.
  • response.content is one of these inefficient methods

Additional Considerations:

  • As this is a 'send whole file' mechanism, should a 'Content-Disposition: attachment; filename=' header be added? Should the filename come from the HttpResponseSendFile constructor so the application can name it what it likes? [Personally, I think so, on both counts.]

Changed 5 years ago by mizatservercave

PARTIAL patch (FOR TESTING ONLY) to verify HttpResponseSendFile works without specific handler support

Changed 5 years ago by mizatservercave

Updated patch. Adds support for 'response.content'

comment:25 follow-up: Changed 5 years ago by mizatservercave

Regarding the 'v1.2' version of the patch I just uploaded, note that for large files when accessed through response.content loads the whole thing up into the webserver's RAM before serving. A good test example is OpenOffice.org 3's 143MB installer. Bad for small (eg, 256MB RAM) webservers.

RAM also seems to 'hang around' after each request, so I suspect that .close() isn't being called properly on self._container, or _get_content needs to add a signal-hook to signals.request_finished to expire the content from RAM by deleting the variable or setting it to or somesuch.

That being said, accessing request.content is a compatibility property, and so shouldn't be used if at all possible.

The below test-case works for a raw, manual test, however, in addition to working with the "_test" patch above:

from django.http import HttpResponseSendFile
r = HttpResponseSendFile('./bigfile')
t = open('/tmp/test', 'wb')
t.write(r.content)
t.close()

# md5 and filesize of ./bigfile and /tmp/test match.

comment:26 Changed 5 years ago by mizatservercave

Reformatting the example since formatting got munged in the previous update.

from django.http import HttpResponseSendFile
r = HttpResponseSendFile('./bigfile')
t = open('/tmp/test', 'wb')
t.write(r.content)
t.close()

# md5 and filesize of ./bigfile and /tmp/test match.

comment:27 follow-up: Changed 5 years ago by jacob

Looks good. I'd like to see support for X-SendFile baked in. This should be too hard to do: instead of storing the file name in self.sendfile_filename, store it in an X-SendFile header. Then we just need a small piece of middleware -- or, perhaps, something added to the common middleware? -- that responds to a setting (SERVER_SUPPORTS_X_SENDFILE or somesuch) and returns the response with the header and an empty body.

comment:28 Changed 5 years ago by jacob

  • milestone set to 1.1 beta
  • Triage Stage changed from Design decision needed to Accepted

comment:29 Changed 5 years ago by grahamd

X-Sendfile, although understood by some servers/proxies isn't the only header used for this purpose. The perbal package has something called X-Reproxy-File which I believe acts the same way. The perbal package also has X-Reproxy-URL, which instead of serving up a file, reinjects the URL as a request into the request handling system. This means the request gets pushed back to a backend system to serve. Thus it might get bumped off to a static media server which then serves up the matching file. The request may also be handled by a dynamic web application. In nginx, there is a similar thing to X-Reproxy-URL called X-Accel-Redirect. Finally both these are in some respects similar to the CGI internal redirection using Location header and 200 response which Apache CGI supports and which other servers provding CGI should also support. This CGI mechanism of using Location header will also supported for mod_wsgi daemon mode in mod_wsgi 3.0.

Have been looking at X-Sendfile support for mod_wsgi daemon mode in mod_wsgi 3.0, but the problem with it is that it side steps standard Apache permissions checks on files. Yes, an application could just read the file itself and return it anyway, but the difference is that in daemon mode the user can be different to that Apache runs as, thus X-Sendfile would be handled as the Apache user and not the daemon process user. Thus access may not be available to the file, or access may be available to files that the daemon process shouldn't otherwise have access to. This is why mod_wsgi 3.0 at this point will only support internal URL redirection via Location header as per CGI specification. Namely, the internal redirection has all the Apache permission checks applied.

The only issue with things like X-Accel-Redirect and CGI Location redirection is that the URL has to be mapped to a file and the presence of a mapping would normally mean an outside party could access the resource directly. In nginx they get around this through flagging the URL as private and so only a candidate for use by internal redirects. In the case of Apache, you would need to use mod_rewrite to enforce a condition that URL is forbidden if not a sub request made within the server itself.

Anyway, it may be felt that this is all irrelevant, but want to at least point out that X-Sendfile isn't the only header name used and so you may want to make it configurable. The URL redirection mechanisms available might still also be employed if header can be overridden, but may be a bit tricky here if code can force the file to be loaded inside Django application before response gets sent, with it turning into a byte stream instead. The problem here is that for X-Reprox-URL, it may actually refer to something on a different host if they allow a 'http://host' component in the URL, thus can't sensibly open it as a file. For CGI Location header, no 'http://host', but it is still a logical URL and can't be opened as a file. But then, if X-Sendfile or X-Reproxy-File are used and the front end is on a different system, the Django application may not even be able to open the path anyway as file system view could be different.

So, in respect of the latter, with the X-Sendfile approach, is it always the case that it always makes it out as a X-Sendfile response, or can the Django application, because of some sort of output filter or similar, force the path to be resolved in the application with data read at that point.

Oh and finally, for X-Sendfile, does the response sent always have a 200 HTTP status code? I know that is what CGI Location wants, but not sure about all the others. If the response codes aren't consistent, then even if can override header name, may not work for all.

comment:30 Changed 5 years ago by grahamd

BTW, mod_wsgi 2.0 through 2.3 has an issue that can occur under certain conditions, with a file returned using wsgi.file_wrapper being truncated when size is > 255 and < ~8000 bytes. This is due to an obscure Apache optimisation. The issue will be fixed in mod_wsgi 2.4. Details, with a description of fix, can be found at:

http://code.google.com/p/modwsgi/issues/detail?id=132

One also should not use mod_wsgi 2.0 through 2.1, which can also truncate a file returned using wsgi.file_wrapper if it contains an embedded null character.


comment:31 Changed 5 years ago by gregoire

  • Cc gregoire@… added

The patch does not work with GZIP middleware.

Using mod_wsgi, I get this error: "IOError: client connection closed"

comment:32 Changed 5 years ago by mrts

  • Needs documentation set
  • Patch needs improvement set

Questions:

  1. Why to you first do an expensive call to getsize in self['Content-Length'] = os.path.getsize(path_to_file) and later discard it in if hasattr(response, 'sendfile_filename') and response.has_header('Content-Length'): del response['Content-Length'] in WSGI handler? I didn't find anything in PEP 333 that would suggest this is needed. Please correct me or remove that bit.
  2. Although this is mostly a conceptual/taste matter, wouldn't it be more clear to use isinstance(response, HttpResponseSendFile) instead of hasattr(response, 'sendfile_filename') to discern HttpResponseSendFile objects?

Nitpicking HttpResponseSendFile.__init__:

  1. Comparisons to singletons like None should always be done with 'is' or 'is not', never the equality operators. -- PEP 8, thus content_type == None should be content_type is None instead.
  2. Opening the file needs to be consistent. You open it in the constructor and later reopen again in handlers, so the file is opened twice.
  3. The logic of parent constructor should be reused, I suggest the following instead:
    class HttpResponseSendFile(HttpResponse):
        def __init__(self, path_to_file, content_type=None, blocksize=8192):
            if not content_type: # or even if content_type is None
                from mimetypes import guess_type
                content_type = guess_type(path_to_file)[0]
                if content_type is None: # probably windows w/o proper mimetype lookup
                    content_type = "application/octet-stream"
            container = open(path_to_file) # be consistent, either open() here and don't reopen in handlers or don't open() here
            super(HttpResponseSendFile, self).__init__(content=container, content_type=content_type)
            self.sendfile_filename = path_to_file
            self.block_size = blocksize
            # compatibility; fake being a regular HttpResponse if needed
            self._headers['content-length'] = ('Content-Length', os.path.getsize(path_to_file))
    

Looks like more thought has to be given to the general context (basehttp.py etc). Docs and tests are missing as well.

Otherwise, nice work!

comment:33 Changed 5 years ago by mrts

As far as I can see, HttpResponseSendFile should not mimic HttpResponse in the sense that it should always be specially handled. Ordinary HttpResponse can take a file object in the content parameter already and serve the file from within Python, so that behaviour should be explicitly disallowed in HttpResponseSendFile.

Attaching a patch that does exactly this, i.e. doesn't open() the file and passes None as content to parent constructor to make it explicit that the content is neither a string nor can be handled with the default behaviour. I'm not sure about the Content-Length header though, I've currently added it to be compatible with ServerHandler.cleanup_headers(), but it may need more thought.

Btw, how mod_python handles req.sendfile can be seen here: https://svn.apache.org/repos/asf/quetzalcoatl/mod_python/trunk/src/requestobject.c .

Changed 5 years ago by mrts

Disallow default HttpResponse behaviour (content etc)

comment:34 in reply to: ↑ 25 Changed 5 years ago by mrts

RAM also seems to 'hang around' after each request, so I suspect that .close() isn't being called properly on self._container, or _get_content needs to add a signal-hook to signals.request_finished to expire the content from RAM by deleting the variable or setting it to or somesuch.

There's a slight possibility that the cause may be return iter(lambda: filelike.read(block_size), '') that was used in the previous patch if. There's no close() in the resulting iterator. I used the FileWrapper from basehttp.py instead, please try if this changes things.

comment:35 Changed 5 years ago by mrts

As for middleware, they can explicitly bypass HttpResponseSendFile responses as follows:

    def process_response(self, request, response):
        if isinstance(response, HttpResponseSendFile):
            return response

or the handlers can bypass HttpResponseSendFile implicitly in __call__:

            if not isinstance(response, HttpResponseSendFile):
                for middleware_method in self._response_middleware:
                    response = middleware_method(request, response)

comment:36 Changed 5 years ago by mrts

Changed 5 years ago by mrts

Bypass middleware

Changed 5 years ago by mrts

Fix method naming typo

Changed 5 years ago by mrts

Doh, previous diff didn't contain the fix, second try.

Changed 5 years ago by mrts

Fix yet another copy-paste omission.

comment:37 Changed 5 years ago by grahamd

  • Cc Graham.Dumpleton@… added

Can someone indicate whether the intention is, or have these changes supported X-Sendfile as mentioned? I looked through some of the multitude of diffs, but couldn't see it mentioned.

comment:38 follow-up: Changed 5 years ago by jacob

Graham: Yes, I'd expect to see x-sendfile support before I check this in.

mrts: I explicitly asked asked that HttpResponseSendFile mimic standard HttpResponse semantics. This is a backwards-compatibility issue: writing custom handlers is an advanced but support feature.

comment:39 Changed 5 years ago by grahamd

Jacob: Okay. Only asked as wanted to make sure that the name of header used was going to be somehow configurable to allow for those header names used by other things like Apache/nginx and perlbal and not be X-Sendfile specific. Thanks.

comment:40 Changed 5 years ago by jacob

That should be pretty easy to pull off, I think.

comment:41 in reply to: ↑ 38 Changed 5 years ago by anonymous

Replying to jacob:

mrts: I explicitly asked asked that HttpResponseSendFile mimic standard HttpResponse semantics. This is a backwards-compatibility issue: writing custom handlers is an advanced but support feature.

OK, I'll try to look into it in the evening and provide something for X-Sendfile as well.

comment:42 Changed 5 years ago by mrts

(It was me posting the last comment)

Note that I don't like the HttpResponse compatible solution.

Concerns:

  • open() in __init__() will remain messy, if only because of mod_python that doesn't accept a file object in sendfile(); bluntly consuming two file descriptors for one response even though one isn't needed does not look sane to me,
  • we can open() lazily, but then we have to introduce state tracking (file_is_open), resulting in more complex and less robust code.

Is it wise to go to such lengths to cater for minor backwards-compatibility concerns? Jacob, which approach do you prefer?

comment:43 follow-up: Changed 5 years ago by jacob

Neither of those is necessary. You just need to make content into a property which lazily opens and reads the file.

comment:44 in reply to: ↑ 43 Changed 5 years ago by mrts

Replying to jacob:

Neither of those is necessary. You just need to make content into a property which lazily opens and reads the file.

That's what I meant by "lazily". However, state tracking is still required as the file has to be explicitly closed in HttpResponseSendFile.close(). This can not be done in content due to __iter__. And the file may not be open() at all if content is never touched. Thus, state tracking seems unavoidable.

comment:45 Changed 5 years ago by gabor

  • Cc gabor@… added

Changed 5 years ago by mrts

Added the requested header, as well as docs and tests.

comment:46 Changed 5 years ago by mrts

Added the configurable X-Sendfile header and preliminary docs and tests -- mainly to ease work for others who like to drive this further.

However, as of now the patch is still incomplete. I personally don't like providing any content for the reasons elaborated in http://groups.google.com/group/django-developers/browse_thread/thread/82367cf18eedfe89. I'm not the only one, see e.g. http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/.

Sadly, it looks like this will probably not make it into 1.1 as I likely will not have time for this in the weekend.

comment:47 Changed 5 years ago by mrts

Another issue: the disposition header should perhaps be configurable with a constructor argument (inline is needed to display files directly in browser).

comment:48 follow-up: Changed 5 years ago by mtredinnick

  • milestone changed from 1.1 beta to 1.2

Still too many edge-cases not being accounted for here (basically handling all the stuff Graham mentioned for a start). It's not ready for 1.1.

mrts: The final solution will have to fall-back gracefully if the user's webserver doesn't have any direct sendfile support, since otherwise you can't use this feature in a redistributable application (the application writer doesn't know how it will be installed). The client browser shouldn't ever receive an empty file. That's what Jacob meant by backwards-compatible, he just used an ambiguous phrase.

comment:49 in reply to: ↑ 48 Changed 5 years ago by mrts

Replying to mtredinnick:

mrts: The final solution will have to fall-back gracefully if the user's webserver doesn't have any direct sendfile support, since otherwise you can't use this feature in a redistributable application (the application writer doesn't know how it will be installed). The client browser shouldn't ever receive an empty file. That's what Jacob meant by backwards-compatible, he just used an ambiguous phrase.

The point is valid. However, looking at it from deployer's viewpoint I would always want to know if the efficient method is used or not (finding it out via the OS utilities is really cumbersome). As of now I'll add another default setting, SENDFILE_FALLBACK_ALLOWED = True and define content with:

def _get_content(self):
    if not settings.SENDFILE_FALLBACK_ALLOWED:
        raise RuntimeError("Fall-back to serving files via Python not allowed by configuration in HttpResponseSendFile.")
    else:
        # open the file and return its content

This will be just a proof of concept and I abhor adding yet another setting as much as you probably do, but I can't find a better way. Feel free to suggest one.

As this is not urgent, being targeted to 1.2, I can't yet tell when I update the patch.

comment:50 Changed 5 years ago by Uninen

  • Cc ville@… added

comment:51 Changed 5 years ago by hackeron

  • Cc hackeron@… added

comment:52 Changed 5 years ago by ccahoon

  • Cc chris.cahoon@… added
  • Owner changed from adrian to ccahoon

comment:53 Changed 5 years ago by ccahoon

(In [11131]) [soc2009/http-wsgi-improvements] Initial HttpResponseSendFile support, changes pulled from 03/21/09 patch on refs #2131.

This does not pass the included regression tests. However, since this feature
will be entirely based on these changes, which have already gone through a
great number of iterations, I thought it would be sensible to start here.

All of the work here is ymasuda, mizatservercave, and mrts (apologies if I
missed anyone). I hope to take their work
down the final stretch.

comment:54 follow-up: Changed 5 years ago by ccahoon

I have made changes for HttpResponseSendFile support, in r11266 and earlier commits. I am still getting used to putting the refs in my commit messages, but when the changes get merged with trunk it will be there. If you are interested, let me know what you think.

comment:55 in reply to: ↑ 54 Changed 5 years ago by ccahoon

Replying to ccahoon:

I have made changes for HttpResponseSendFile support, in r11266 and earlier commits. I am still getting used to putting the refs in my commit messages, but when the changes get merged with trunk it will be there. If you are interested, let me know what you think.

Correction: r11268. Also, r11212 and r11210. Sorry about that.

comment:56 Changed 5 years ago by ccahoon

(In [11319]) [soc2009/http-wsgi-improvements] Changes for HttpResponseSendFile support in FastCGI. Refs #2131.

comment:57 Changed 5 years ago by ccahoon

(In [11320]) [soc2009/http-wsgi-improvements] Fix early settings use in HttpResponseSendFile. Refs #2131.

comment:58 Changed 5 years ago by ccahoon

(In [11325]) [soc2009/http-wsgi-improvements] Add a how-to for using HttpResponseSendFile with various server arrangements. Refs #2131.

comment:59 Changed 5 years ago by ccahoon

(In [11326]) [soc2009/http-wsgi-improvements] Remove setting the Content-Length header for HttpResponseSendFile from the handler, for compatibility, and add a content attribute. Refs #2131.

Also adds a _charset class attribute to HttpResponse so the children all have it.

comment:60 Changed 5 years ago by ccahoon

(In [11327]) [soc2009/http-wsgi-improvements] Change the setting name to SENDFILE_HEADER and update docs. Refs #2131.

comment:61 Changed 5 years ago by ccahoon

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Fixed on a branch

comment:62 Changed 5 years ago by ccahoon

Fixed on branches/soc2009/http-wsgi-improvements.

comment:63 follow-up: Changed 5 years ago by andreaskundig

  • Summary changed from [patch] HttpResponseSendFile for serving static files handler-specific sendfile mechanism to it breaks i18n on django 1.1

I'm not sure this is the right place to signal a bug, but HttpResponseSendFile breaks internationalization.

I have a page where the user can change his language as described in the documentation:
http://docs.djangoproject.com/en/dev/topics/i18n/#the-set-language-redirect-view

On my page there is a link to a view that sends a pdf with HttpResponseSendFile. If I click on this link I receive the pdf, but afterwards I no longer can change the language. If I was on an english page and then submit a language change to french, the /i18n/setlang/ view recognizes the correct language code 'fr' and puts it into the session, but once the request arrives at my view request.LANGUAGE_CODE is 'en'.

It works again after I restart Django.

(I am using Django 1.1 with python 2.5.4 on windows)

comment:64 in reply to: ↑ 63 Changed 5 years ago by andreaskundig

Bypassing the middleware in base.py that breaks i18n. When I comment that out it works again.

comment:65 Changed 5 years ago by kmtracey

  • Summary changed from it breaks i18n on django 1.1 to HttpResponseSendFile for serving static files handler-specific sendfile mechanism

Reverting confusing change to ticket summary.

comment:66 follow-up: Changed 5 years ago by ccahoon

Which patch is it that breaks i18n? Are you using the one from the SOC branch?

comment:67 in reply to: ↑ 66 Changed 5 years ago by andreaskundig

Replying to ccahoon:

Which patch is it that breaks i18n? Are you using the one from the SOC branch?

I'm using the latest patch (added by mrts on 03/21/09):
http://code.djangoproject.com/attachment/ticket/2131/httpresponsesendfile-no-default-content_bypass-middleware_with-header_with-docs-and-tests.diff

I don't know what the SOC branch is.

comment:68 Changed 5 years ago by AntonBessonov

  • Cc exelib@… added

comment:69 Changed 4 years ago by redalastor

  • Cc redalastor@… added

comment:70 Changed 4 years ago by springmeyer

  • Cc dane.springmeyer@… added

comment:71 Changed 4 years ago by jezdez

  • Cc jezdez added

comment:72 Changed 4 years ago by danros

  • Cc danros added

comment:73 Changed 4 years ago by danros

  • Cc dan@… added; danros removed

comment:74 Changed 4 years ago by danny_adair

  • Cc danny.adair@… added

comment:75 Changed 4 years ago by anonymous

  • Cc apollo13 added

comment:76 Changed 4 years ago by Ciantic

  • Cc Ciantic added

comment:77 Changed 4 years ago by mrts

Note that you don't need this fix to use efficient file serving from files accessible from the filesystem.

Just use an empty response and set the X-Sendfile header manually if using Apache. If not, check your server documentation for the right header name.

E.g.:

...
response = HttpResponse()
response['X-Sendfile'] = '/full/path/to/file'
response['Content-Type'] = 'some/content_type'
response['Content-Length'] = os.stat('/full/path/to/file').st_size
response['Content-Disposition'] = 'attachment; filename="foo"'
return response

comment:78 Changed 4 years ago by guettli

  • Cc hv@… added

comment:79 Changed 4 years ago by ubernostrum

  • milestone 1.2 deleted

1.2 is feature-frozen; bumping this feature request off the milestone.

comment:80 Changed 4 years ago by bronger

  • Cc bronger@… added

comment:81 Changed 4 years ago by david

  • Cc david added

comment:82 Changed 4 years ago by mitar

  • Cc mmitar@… added

comment:83 Changed 4 years ago by JivanAmara

  • Cc jivan@… added

comment:84 Changed 4 years ago by anonymous

  • Cc darkrho@… added

comment:85 Changed 4 years ago by SmileyChris

  • Patch needs improvement set
  • Triage Stage changed from Fixed on a branch to Accepted

I'm going to put the stage back to Accepted, since the branch this is fixed on is rather stale. It'd be good to see a stand-alone patch here updated with any changes made in the branch if someone wants a task.

comment:86 Changed 3 years ago by onecreativenerd

  • Cc onecreativenerd@… added

comment:87 in reply to: ↑ 27 Changed 3 years ago by anonymous

Replying to jacob:

Looks good. I'd like to see support for X-SendFile baked in. This should be too hard to do: instead of storing the file name in self.sendfile_filename, store it in an X-SendFile header. Then we just need a small piece of middleware -- or, perhaps, something added to the common middleware? -- that responds to a setting (SERVER_SUPPORTS_X_SENDFILE or somesuch) and returns the response with the header and an empty body.

Testing the current patch, the "baked in" x-sendfile support breaks the feature. I see both wsgi and x-sendfile messages in the log, for the same file. I suspect something like the following: due to the header, apache discards the returned content (the file wsgi is supposed to be serving), and tries to serve the file itself via mod_xsendfile. In my config the wsgi daemon has permission to read the file, but the "apache" user does not, so xsendfile fails. If your user "apache" has permission to read the file, you probably won't see the bug, but behind the scenes the file is being read twice (discarded the first time).

Disabling mod_xsendfile allows it to work. It's worth reiterating that x-sendfile does not work if the main apache user does not have permission to read the file. A solution is sorely needed for serving files in the context of wsgi.

comment:88 Changed 3 years ago by lrekucki

  • Severity changed from normal to Normal
  • Type changed from enhancement to New feature

comment:89 Changed 3 years ago by guettli

  • Easy pickings unset
  • UI/UX unset

SendFile support in django would be nice. But I am happy this now: https://github.com/johnsensible/django-sendfile

This is a wrapper around web-server specific methods for sending files to web clients.
This is useful when Django needs to check permissions associated files, but does not want
to serve the actual bytes of the file itself. i.e. as serving large files is not what Django is made for.

comment:90 Changed 2 years ago by guettli

  • Resolution set to wontfix
  • Status changed from reopened to closed

I asked on django-dev about this ticket, there was only one reply. But the core developers don't care.

http://groups.google.com/group/django-developers/browse_thread/thread/b02c19daf0af5602/1bea6744f9e2052a?pli=1

I close this ticket. You have got to use the external django-sendfile project.

comment:91 Changed 2 years ago by jezdez

  • Component changed from Core (Other) to HTTP handling
  • Needs documentation set
  • Resolution wontfix deleted
  • Status changed from closed to reopened

A core developer clearly has marked this ticket as accepted above and non of the core devs have argued against it on the mailing list. So in other words, this just needs someone to spear head the porting of the changes made to the soc2009/http-wsgi-improvements branch.

Changed 2 years ago by milosu

updated patch compatible with Django-1.4b1 including test case

comment:92 Changed 2 years ago by milosu

  • Cc milos.urbanek@… added
  • Needs documentation unset

comment:93 Changed 23 months ago by edevil

  • Cc andre.cruz@… added

So, now that there is an updated patch and some core developers interested, can we expect this patch to be included in the next Django version?

comment:94 Changed 18 months ago by aaugustin

  • Owner changed from ccahoon to aaugustin
  • Status changed from reopened to new

comment:95 Changed 18 months ago by aaugustin

I see people struggling with private static files all the time, and I've implemented several variants of this myself. This feature is needed.

However, I'm not sure that merging django-sendfile as a contrib app is the best option. There've been several discussions about endorsing third-party applications covering common needs. If we started doing that, we could just point to https://github.com/johnsensible/django-sendfile in the documentation.

EDIT: Alex noticed some dubious code. We can't endorse that.

NB: since #7581 was fixed, serving files through django.views.static.serve or with a StreamingHttpResponse is less inefficient (although still not recommended): the entire file content isn't loaded in memory.

NB: the latest patch won't work with nginx because X-Accel-Redirect requires a relative path. Also it still contains changes for mod_python which was removed from master.

Last edited 18 months ago by aaugustin (previous) (diff)

comment:96 Changed 18 months ago by aaugustin

  • Owner changed from aaugustin to nobody

comment:97 Changed 17 months ago by benoitbryon

Just released http://pypi.python.org/pypi/django-downloadview, which is a third-party tool that provides class-based download views. Support of nginx's X-Accel is implemented with a middleware (global) or with decorators (per-view configuration).

I didn't know django-sendfile before I discovered this ticket... I'm to look at it ;)

comment:98 Changed 12 months ago by sfllaw

  • Cc sfllaw@… added

comment:99 Changed 12 months ago by chris.cahoon@…

  • Cc chris.cahoon@… removed

comment:100 Changed 5 months ago by benoitbryon

Just released an update of django-downloadview, and came back here to check the status of the ticket...

One year ago, aaugustin said:

I'm not sure that merging django-sendfile as a contrib app is the best option. There've been several discussions about endorsing third-party applications covering common needs. If we started doing that, we could just point to ​https://github.com/johnsensible/django-sendfile in the documentation.

But this ticket is still tagged as "accepted"... is it still valid? I mean, would such a feature be included in Django's core one day?

In the meantime, we keep on developing third-party projects (by the way, I discovered a new one called "django-transfer" recently).

If such a feature is to be added to Django's core, may you consider django-dowloadview's mechanisms?

  • Generic views cover commons patterns, depending on the file (FileField in Model, Storage, path, URL, text, StringIO, generator...)
  • Generic views and mixins are easy to override to implement specific use cases
  • Views return some "FileStreamingResponse" (StreamingHttpResponse subclass)
  • FileStreamingResponse carries a file wrapper
  • File wrappers describe files. FieldFile, File, ContentFile are valid file wrappers. Some other wrappers may be added to support storages, URLs or generators.
  • Response middlewares (or decorators) are given the opportunity to capture the FileStreamingResponse and convert them to some optimized response (Sendfile, XAccel, ...) before file content is read from disk/network or loaded into memory.
  • Test utilities help developers check responses or views.

See http://django-downloadview.readthedocs.org/en/1.3/overview.html for details.

comment:101 Changed 5 months ago by aaugustin

  • Resolution set to wontfix
  • Status changed from new to closed

Benoît, thanks for this update. The scope covered by django-downloadview is far from trivial. Thinking about this again one year later, I believe it's better left to third party apps.

This ticket has grown long over the years and it's very hard to tell what it's proposing exactly after all these discussions. I'm going to close it.

If some simple changes to Django would make the implementation of django-downloadview easier, let us know, either on the django-developers mailing list or in a new ticket.

If someone wants to propose a simple and straightforward API to handle the common case in Django, please send a concrete proposal to django-developers. (This could stifle third-party apps.)

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.