Opened 7 years ago

Closed 2 years ago

#6027 closed Bug (duplicate)

FileWrapper iterator drained by GzipMiddleware before content can be returned

Reported by: volsung@… Owned by: nobody
Component: Core (Other) Version: master
Severity: Normal Keywords:
Cc: forest, obeattie, daemianmack@…, net147 Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using FileWrapper to return content in an HttpResponse object works fine, unless you have GzipMiddleware active. This combination will try to iterate through the FileWrapper object several times, but the iterator will return nothing after the first pass, and the HTTP response received by the client will be empty.

The problem is fixed if you modify FileWrapper.__iter__() to return a new FileWrapper, rewound to the beginning of the file. (At the moment, FileWrapper.__iter__() returns self, so the current position in the file is not changed.) Unfortunately, this fix cannot be used in general, since not all file-like objects support seeking. Also, I don't know what the correct Python iterator protocol is when __iter__() is called on an existing iterator object. Is it supposed to return a new iterator, reset to the start of the sequence, or an iterator at the current position? (I am assuming the latter...)

Attachments (1)

file_wrapper_iter.diff (410 bytes) - added by hanson2010 3 years ago.
modifying iter of FileWrapper

Download all attachments as: .zip

Change History (14)

comment:1 Changed 7 years ago by volsung@…

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

On further inspection, I think the best fix for this is to not use django.utils.text.compress_string() in django.middleware.gzip.GZipMiddleware.process_response(). Instead, it would be best to write a new function that returns an iterator over compressed file chunks, analogous to FileWrapper. This also has the advantage of not requiring the gzip file all be in memory at one time, which is useful if you are using the GZipMiddleware in an application which sends large Postscript files.

comment:2 Changed 7 years ago by jacob

  • Triage Stage changed from Unreviewed to Accepted

comment:3 Changed 6 years ago by ahlongxp

I use this snippet

    response = HttpResponse()
    response['Content-Type'] = mimetype
    response['Content-Length'] = uf.file.size
    response['Content-Disposition'] = ('attachment; filename=\"' +
                                    uf.file.name + "\"")
    for chunk in uf.file.chunks():
        response.write(chunk)
    return response

comment:4 Changed 5 years ago by forest

  • Cc forest added

comment:5 Changed 5 years ago by obeattie

  • Cc obeattie added

comment:6 Changed 5 years ago by daemianmack

  • Cc daemianmack@… added

comment:7 Changed 4 years ago by FunkyBob

Patch attached to #10627 can at least make the middleware behave better, in that the documentation says:

Will not compress content bodies less than 200 bytes long, ... or responses that have the Content-Encoding header already specified.

If the Content-Encoding header is tested before the length, at least then the gzip middleware will behave a little less spookily, and have a clear "way out" for developers.

Documentation could be as simple as "If you are using the gzip middleware, to avod it consuming your generator/iterator content set the Content-Encoding header."

comment:8 Changed 4 years ago by gabrielhurley

  • Severity set to Normal
  • Type set to Bug

comment:9 Changed 3 years ago by net147

  • Cc net147 added
  • Easy pickings unset
  • UI/UX unset

comment:10 Changed 3 years ago by snyderra@…

Here is a flow of the problem

gzip middleware calls response.content. this is a property that does a ''.join(self._container)
this causes the iterator to be called which drains the iterator.

the quick fix is to make the following class and use it. Hopefully the filelike object you are using has a seek method

class FixedFileWrapper(FileWrapper):
    def __iter__(self):        
        self.filelike.seek(0)
        return self

a more proper fix would be to implement the following in HTTPResponse

def __len__(self):
        if 'Content-Length' in self._headers:
            return int(self._headers.get('Content-Length',len(self._container)))

This way the gzip middleware could just do a len(response) and optionally the FileWrapper class could also have a __len__ method

Changed 3 years ago by hanson2010

modifying iter of FileWrapper

comment:11 Changed 3 years ago by grahamd

As well as FileWrapper issue where first use of request.content will consume it and not available on second use, you also have case where someone is defining a custom iterable with iter which does properly restart. In this case if the iteration is an expensive operation, ie., generating data on the fly from an operation triggered on each iter call to create actual iterable, then you end up doing the expensivee operation more than once. Arguable that response.content could cache the result within the HttpResponse object when generated. The issue there is obviously that people wanted to use an iterable to avoid having it all in memory.

comment:12 Changed 3 years ago by claudep

Graham, you also made an interesting comment in ticket:13222#comment:14. Did you see any progress in the WSGI spec about how iterables and WSGI should interact? I noted several tickets about this issue in Django, and I'd like to see a global plan to solve it. Would applying patch in #13222 a first step in the right direction?

comment:13 Changed 2 years ago by aaugustin

  • Resolution set to duplicate
  • Status changed from new to closed

I think we can close this ticket for three reasons:

  • fixed: The original problem is solved by the StreamingHttpResponse that I recently introduced as a fix for #7581. It guarantees that iterable content will be consumed only once.
  • invalid: FileWrapper isn't a documented API. It's a class provided by the wsgiref.util module of the standard library. Django might have shipped a copy of this module in the past, but it no longer does.
  • duplicate: reading through the comments, I don't see anything that isn't already covered by #6527 or #13222.
Note: See TracTickets for help on using tickets.
Back to Top