Opened 9 years ago

Closed 3 years ago

#2504 closed Bug (duplicate)

ConditionalGetMiddleware causes Content-Length to be 0 on HttpResponse with fileobject

Reported by: django@… Owned by: aaugustin
Component: Core (Other) Version: master
Severity: Normal Keywords:
Cc: hv@…, kmike84@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

middleware/http.py:

        if not response.has_header('Content-Length'):
            response['Content-Length'] = str(len(response.content))

If response is build with a file object, str(len(response.content)) is 0, and the browser will ignore the page, if the handler does not set content-length.

If response.content is not a string, do not add Content-Length.
Optional we could implement:
If response.content is a file like object, and content.name is a valid filename, we could get the size with os.path.getsize and those.

Attachments (4)

get_content_length.diff (3.2 KB) - added by frasern 8 years ago.
gzip.diff (660 bytes) - added by frasern 8 years ago.
get_content_length2.diff (4.6 KB) - added by frasern 8 years ago.
Supercedes get_content_length.diff
2504_http_response_and_conditional_middleware.patch (4.2 KB) - added by poelzi 8 years ago.
new patch, fixing the issue with pipes and files

Download all attachments as: .zip

Change History (26)

comment:1 Changed 9 years ago by mtredinnick

  • Owner changed from adrian to mtredinnick

comment:2 Changed 9 years ago by mtredinnick

Some notes about a solution to this problem, since I haven't started working on it and somebody else may wish to:

If we are using HTTP/1.1, we can send the Transfer-Encoding: chunked header and chunk the output into "reasonable size chunks", without needing to worry about a content length ahead of time.

With HTTP/1.0, however, we need a content-length header. For file objects, looking at the file size is not bad. For other iterators, we need to read it all into memory and compute the length.

comment:3 Changed 8 years ago by Simon G. <dev@…>

  • Triage Stage changed from Unreviewed to Accepted

comment:4 Changed 8 years ago by anonymous

  • Component changed from Core framework to Internationalization
  • Needs documentation set
  • Triage Stage changed from Accepted to Design decision needed

comment:5 Changed 8 years ago by Simon G. <dev@…>

  • Component changed from Internationalization to Core framework
  • Owner changed from mtredinnick to adrian
  • Triage Stage changed from Design decision needed to Accepted

Please don't do anonymous triage, or change things without a reason.

comment:6 Changed 8 years ago by frasern

  • Owner changed from nobody to frasern
  • Status changed from new to assigned

Changed 8 years ago by frasern

Changed 8 years ago by frasern

comment:7 Changed 8 years ago by frasern

  • Has patch set
  • Needs documentation unset

The Transfer-Encoding heading is a hop-by-hop header, which are not permitted to be set by PEP 333 (WSGI):

applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as "hop-by-hop" operations, these encodings are the province of the actual web server/gateway

There doesn't seem to be any way for a WSGI handler to require or even indicate that chunked encoding should be used: it appears that this remains the purview of the web server. I also checked and couldn't find a way in mod_python either.

The attached patch (including docs and tests) introduces a get_content_length() method to HttpResponse which efficiently determines the content length, if possible.

As Malcolm pointed out to me on IRC, it would be sensible to use response.get_content_length() instead of len(response.content throughout. I checked and the only instance I could see of this was in GZipMiddleware, so I've attached a patch to update that too.

comment:8 Changed 8 years ago by frasern

Better patch now attached which should handle all types of iterators passed in as content, not just files, lists and tuples.

Note the change in _get_content that stores the joined string. This was done because once an iterator has been iterated through, it is effectively spent and cannot be iterated through again. It may also be slightly optimal as it means it won't need to join as many values in subsequent calls.

Changed 8 years ago by frasern

Supercedes get_content_length.diff

comment:9 Changed 8 years ago by arne

I create tar files on the fly with os.popen2() and pass stdout, which is a fileobj, as response.content, doing a len(response.content) reads the full tar file into memory, which can be > 500MB in my case. Without checking for the content length everything works fine and I can do a "streaming" download to the browser ...

As a workaroung I have patched my django tree and added the following two lines at the top of process_response() in the ConditionalGetMiddleware:

if(response._is_string == False):
    return response


This may not be fully HTTP compatible, but I see no other way of sending large files without creating temp-files ...

comment:10 Changed 8 years ago by arne

as a side note and as reply @ 09/27/06 06:57:46 changed by mtredinnick:

Both, RFC 1945 HTTP/1.0 and RFC 2626 HTTP/1.1, state that an absent Content-Length Header is allowed. If the header is missing and no special Transfer-Encoding (e.g. chunked) is set, closing the connection on the server side determines the end of the response content.

In RFC 1945 this is stated in part 7.2.2 and 10.4,
In RFC 2616 this ist stated in part 4.4 method 5

comment:11 Changed 8 years ago by poelzi

  • Owner changed from frasern to poelzi
  • Patch needs improvement set
  • Status changed from assigned to new

Changed 8 years ago by poelzi

new patch, fixing the issue with pipes and files

comment:12 Changed 8 years ago by poelzi

  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

the new patch doesn't contain http/1.1 and chunked encoding stuff. this should be fixed in a seperate bug

comment:13 Changed 8 years ago by frasern

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

If poelzi's new patch is used, then my original gzip.diff patch will need to be updated as get_content_length now raises UnknownSize rather than returning None if the size cannot be determined.

comment:14 Changed 7 years ago by mrmachine

Also see #7581, related to streaming and the Content-Length header with WSGI.

comment:15 Changed 5 years ago by guettli

  • Cc hv@… added

comment:16 Changed 4 years ago by lrekucki

  • Type changed from defect to Bug

comment:17 Changed 4 years ago by lrekucki

  • Severity changed from major to Normal

comment:18 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:19 Changed 3 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:20 Changed 3 years ago by kmike

  • Cc kmike84@… added

comment:21 Changed 3 years ago by aaugustin

  • Owner changed from poelzi to aaugustin

comment:22 Changed 3 years ago by aaugustin

  • Resolution set to duplicate
  • Status changed from new to closed

With the current implementation of HttpResponse, when it's instantiated with an iterator, the first access to response.content will return the content, and the second one will return the empty string.

This is the root cause of this issue, and it's tracked in #6527.

Note: See TracTickets for help on using tickets.
Back to Top