Opened 17 years ago
Closed 12 years ago
#6027 closed Bug (duplicate)
FileWrapper iterator drained by GzipMiddleware before content can be returned
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Forest Bond, Oliver Beattie, daemianmack@…, net147 | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Using FileWrapper to return content in an HttpResponse object works fine, unless you have GzipMiddleware active. This combination will try to iterate through the FileWrapper object several times, but the iterator will return nothing after the first pass, and the HTTP response received by the client will be empty.
The problem is fixed if you modify FileWrapper.__iter__()
to return a new FileWrapper, rewound to the beginning of the file. (At the moment, FileWrapper.__iter__()
returns self, so the current position in the file is not changed.) Unfortunately, this fix cannot be used in general, since not all file-like objects support seeking. Also, I don't know what the correct Python iterator protocol is when __iter__()
is called on an existing iterator object. Is it supposed to return a new iterator, reset to the start of the sequence, or an iterator at the current position? (I am assuming the latter...)
Attachments (1)
Change History (14)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 16 years ago
I use this snippet
response = HttpResponse() response['Content-Type'] = mimetype response['Content-Length'] = uf.file.size response['Content-Disposition'] = ('attachment; filename=\"' + uf.file.name + "\"") for chunk in uf.file.chunks(): response.write(chunk) return response
comment:4 by , 15 years ago
Cc: | added |
---|
comment:5 by , 15 years ago
Cc: | added |
---|
comment:6 by , 14 years ago
Cc: | added |
---|
comment:7 by , 14 years ago
Patch attached to #10627 can at least make the middleware behave better, in that the documentation says:
Will not compress content bodies less than 200 bytes long, ... or responses that have the Content-Encoding header already specified.
If the Content-Encoding header is tested before the length, at least then the gzip middleware will behave a little less spookily, and have a clear "way out" for developers.
Documentation could be as simple as "If you are using the gzip middleware, to avod it consuming your generator/iterator content set the Content-Encoding header."
comment:8 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:9 by , 13 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
UI/UX: | unset |
comment:10 by , 13 years ago
Here is a flow of the problem
gzip middleware calls response.content. this is a property that does a ''.join(self._container)
this causes the iterator to be called which drains the iterator.
the quick fix is to make the following class and use it. Hopefully the filelike object you are using has a seek method
class FixedFileWrapper(FileWrapper): def __iter__(self): self.filelike.seek(0) return self
a more proper fix would be to implement the following in HTTPResponse
def __len__(self): if 'Content-Length' in self._headers: return int(self._headers.get('Content-Length',len(self._container)))
This way the gzip middleware could just do a len(response) and optionally the FileWrapper class could also have a __len__ method
comment:11 by , 13 years ago
As well as FileWrapper issue where first use of request.content will consume it and not available on second use, you also have case where someone is defining a custom iterable with iter which does properly restart. In this case if the iteration is an expensive operation, ie., generating data on the fly from an operation triggered on each iter call to create actual iterable, then you end up doing the expensivee operation more than once. Arguable that response.content could cache the result within the HttpResponse object when generated. The issue there is obviously that people wanted to use an iterable to avoid having it all in memory.
comment:12 by , 13 years ago
Graham, you also made an interesting comment in ticket:13222#comment:14. Did you see any progress in the WSGI spec about how iterables and WSGI should interact? I noted several tickets about this issue in Django, and I'd like to see a global plan to solve it. Would applying patch in #13222 a first step in the right direction?
comment:13 by , 12 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
I think we can close this ticket for three reasons:
- fixed: The original problem is solved by the
StreamingHttpResponse
that I recently introduced as a fix for #7581. It guarantees that iterable content will be consumed only once. - invalid:
FileWrapper
isn't a documented API. It's a class provided by thewsgiref.util
module of the standard library. Django might have shipped a copy of this module in the past, but it no longer does. - duplicate: reading through the comments, I don't see anything that isn't already covered by #6527 or #13222.
On further inspection, I think the best fix for this is to not use django.utils.text.compress_string() in django.middleware.gzip.GZipMiddleware.process_response(). Instead, it would be best to write a new function that returns an iterator over compressed file chunks, analogous to FileWrapper. This also has the advantage of not requiring the gzip file all be in memory at one time, which is useful if you are using the GZipMiddleware in an application which sends large Postscript files.