Opened 17 years ago

Closed 12 years ago

#6027 closed Bug (duplicate)

FileWrapper iterator drained by GzipMiddleware before content can be returned

Reported by: volsung@… Owned by: nobody
Component: Core (Other) Version: dev
Severity: Normal Keywords:
Cc: Forest Bond, Oliver Beattie, daemianmack@…, net147 Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using FileWrapper to return content in an HttpResponse object works fine, unless you have GzipMiddleware active. This combination will try to iterate through the FileWrapper object several times, but the iterator will return nothing after the first pass, and the HTTP response received by the client will be empty.

The problem is fixed if you modify FileWrapper.__iter__() to return a new FileWrapper, rewound to the beginning of the file. (At the moment, FileWrapper.__iter__() returns self, so the current position in the file is not changed.) Unfortunately, this fix cannot be used in general, since not all file-like objects support seeking. Also, I don't know what the correct Python iterator protocol is when __iter__() is called on an existing iterator object. Is it supposed to return a new iterator, reset to the start of the sequence, or an iterator at the current position? (I am assuming the latter...)

Attachments (1)

file_wrapper_iter.diff (410 bytes ) - added by Haisheng HU 13 years ago.
modifying iter of FileWrapper

Download all attachments as: .zip

Change History (14)

comment:1 by volsung@…, 17 years ago

On further inspection, I think the best fix for this is to not use django.utils.text.compress_string() in django.middleware.gzip.GZipMiddleware.process_response(). Instead, it would be best to write a new function that returns an iterator over compressed file chunks, analogous to FileWrapper. This also has the advantage of not requiring the gzip file all be in memory at one time, which is useful if you are using the GZipMiddleware in an application which sends large Postscript files.

comment:2 by Jacob, 17 years ago

Triage Stage: UnreviewedAccepted

comment:3 by ahlongxp, 16 years ago

I use this snippet

    response = HttpResponse()
    response['Content-Type'] = mimetype
    response['Content-Length'] = uf.file.size
    response['Content-Disposition'] = ('attachment; filename=\"' +
                                    uf.file.name + "\"")
    for chunk in uf.file.chunks():
        response.write(chunk)
    return response

comment:4 by Forest Bond, 15 years ago

Cc: Forest Bond added

comment:5 by Oliver Beattie, 15 years ago

Cc: Oliver Beattie added

comment:6 by daemianmack, 15 years ago

Cc: daemianmack@… added

comment:7 by FunkyBob, 14 years ago

Patch attached to #10627 can at least make the middleware behave better, in that the documentation says:

Will not compress content bodies less than 200 bytes long, ... or responses that have the Content-Encoding header already specified.

If the Content-Encoding header is tested before the length, at least then the gzip middleware will behave a little less spookily, and have a clear "way out" for developers.

Documentation could be as simple as "If you are using the gzip middleware, to avod it consuming your generator/iterator content set the Content-Encoding header."

comment:8 by Gabriel Hurley, 14 years ago

Severity: Normal
Type: Bug

comment:9 by net147, 13 years ago

Cc: net147 added
Easy pickings: unset
UI/UX: unset

comment:10 by snyderra@…, 13 years ago

Here is a flow of the problem

gzip middleware calls response.content. this is a property that does a ''.join(self._container)
this causes the iterator to be called which drains the iterator.

the quick fix is to make the following class and use it. Hopefully the filelike object you are using has a seek method

class FixedFileWrapper(FileWrapper):
    def __iter__(self):        
        self.filelike.seek(0)
        return self

a more proper fix would be to implement the following in HTTPResponse

def __len__(self):
        if 'Content-Length' in self._headers:
            return int(self._headers.get('Content-Length',len(self._container)))

This way the gzip middleware could just do a len(response) and optionally the FileWrapper class could also have a __len__ method

by Haisheng HU, 13 years ago

Attachment: file_wrapper_iter.diff added

modifying iter of FileWrapper

comment:11 by Graham Dumpleton, 13 years ago

As well as FileWrapper issue where first use of request.content will consume it and not available on second use, you also have case where someone is defining a custom iterable with iter which does properly restart. In this case if the iteration is an expensive operation, ie., generating data on the fly from an operation triggered on each iter call to create actual iterable, then you end up doing the expensivee operation more than once. Arguable that response.content could cache the result within the HttpResponse object when generated. The issue there is obviously that people wanted to use an iterable to avoid having it all in memory.

comment:12 by Claude Paroz, 13 years ago

Graham, you also made an interesting comment in ticket:13222#comment:14. Did you see any progress in the WSGI spec about how iterables and WSGI should interact? I noted several tickets about this issue in Django, and I'd like to see a global plan to solve it. Would applying patch in #13222 a first step in the right direction?

comment:13 by Aymeric Augustin, 12 years ago

Resolution: duplicate
Status: newclosed

I think we can close this ticket for three reasons:

  • fixed: The original problem is solved by the StreamingHttpResponse that I recently introduced as a fix for #7581. It guarantees that iterable content will be consumed only once.
  • invalid: FileWrapper isn't a documented API. It's a class provided by the wsgiref.util module of the standard library. Django might have shipped a copy of this module in the past, but it no longer does.
  • duplicate: reading through the comments, I don't see anything that isn't already covered by #6527 or #13222.
Note: See TracTickets for help on using tickets.
Back to Top