Opened 10 years ago

Closed 4 years ago

#2504 closed Bug (duplicate)

ConditionalGetMiddleware causes Content-Length to be 0 on HttpResponse with fileobject

Reported by: django@… Owned by: Aymeric Augustin
Component: Core (Other) Version: master
Severity: Normal Keywords:
Cc: hv@…, kmike84@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

middleware/http.py:

        if not response.has_header('Content-Length'):
            response['Content-Length'] = str(len(response.content))

If response is build with a file object, str(len(response.content)) is 0, and the browser will ignore the page, if the handler does not set content-length.

If response.content is not a string, do not add Content-Length.
Optional we could implement:
If response.content is a file like object, and content.name is a valid filename, we could get the size with os.path.getsize and those.

Attachments (4)

get_content_length.diff (3.2 KB) - added by Fraser Nevett 9 years ago.
gzip.diff (660 bytes) - added by Fraser Nevett 9 years ago.
get_content_length2.diff (4.6 KB) - added by Fraser Nevett 9 years ago.
Supercedes get_content_length.diff
2504_http_response_and_conditional_middleware.patch (4.2 KB) - added by Daniel Poelzleithner 9 years ago.
new patch, fixing the issue with pipes and files

Download all attachments as: .zip

Change History (26)

comment:1 Changed 10 years ago by Malcolm Tredinnick

Owner: changed from Adrian Holovaty to Malcolm Tredinnick

comment:2 Changed 10 years ago by Malcolm Tredinnick

Some notes about a solution to this problem, since I haven't started working on it and somebody else may wish to:

If we are using HTTP/1.1, we can send the Transfer-Encoding: chunked header and chunk the output into "reasonable size chunks", without needing to worry about a content length ahead of time.

With HTTP/1.0, however, we need a content-length header. For file objects, looking at the file size is not bad. For other iterators, we need to read it all into memory and compute the length.

comment:3 Changed 10 years ago by Simon G. <dev@…>

Triage Stage: UnreviewedAccepted

comment:4 Changed 9 years ago by anonymous

Component: Core frameworkInternationalization
Needs documentation: set
Triage Stage: AcceptedDesign decision needed

comment:5 Changed 9 years ago by Simon G. <dev@…>

Component: InternationalizationCore framework
Owner: changed from Malcolm Tredinnick to Adrian Holovaty
Triage Stage: Design decision neededAccepted

Please don't do anonymous triage, or change things without a reason.

comment:6 Changed 9 years ago by Fraser Nevett

Owner: changed from nobody to Fraser Nevett
Status: newassigned

Changed 9 years ago by Fraser Nevett

Attachment: get_content_length.diff added

Changed 9 years ago by Fraser Nevett

Attachment: gzip.diff added

comment:7 Changed 9 years ago by Fraser Nevett

Has patch: set
Needs documentation: unset

The Transfer-Encoding heading is a hop-by-hop header, which are not permitted to be set by PEP 333 (WSGI):

applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as "hop-by-hop" operations, these encodings are the province of the actual web server/gateway

There doesn't seem to be any way for a WSGI handler to require or even indicate that chunked encoding should be used: it appears that this remains the purview of the web server. I also checked and couldn't find a way in mod_python either.

The attached patch (including docs and tests) introduces a get_content_length() method to HttpResponse which efficiently determines the content length, if possible.

As Malcolm pointed out to me on IRC, it would be sensible to use response.get_content_length() instead of len(response.content throughout. I checked and the only instance I could see of this was in GZipMiddleware, so I've attached a patch to update that too.

comment:8 Changed 9 years ago by Fraser Nevett

Better patch now attached which should handle all types of iterators passed in as content, not just files, lists and tuples.

Note the change in _get_content that stores the joined string. This was done because once an iterator has been iterated through, it is effectively spent and cannot be iterated through again. It may also be slightly optimal as it means it won't need to join as many values in subsequent calls.

Changed 9 years ago by Fraser Nevett

Attachment: get_content_length2.diff added

Supercedes get_content_length.diff

comment:9 Changed 9 years ago by arne

I create tar files on the fly with os.popen2() and pass stdout, which is a fileobj, as response.content, doing a len(response.content) reads the full tar file into memory, which can be > 500MB in my case. Without checking for the content length everything works fine and I can do a "streaming" download to the browser ...

As a workaroung I have patched my django tree and added the following two lines at the top of process_response() in the ConditionalGetMiddleware:

if(response._is_string == False):
    return response


This may not be fully HTTP compatible, but I see no other way of sending large files without creating temp-files ...

comment:10 Changed 9 years ago by arne

as a side note and as reply @ 09/27/06 06:57:46 changed by mtredinnick:

Both, RFC 1945 HTTP/1.0 and RFC 2626 HTTP/1.1, state that an absent Content-Length Header is allowed. If the header is missing and no special Transfer-Encoding (e.g. chunked) is set, closing the connection on the server side determines the end of the response content.

In RFC 1945 this is stated in part 7.2.2 and 10.4,
In RFC 2616 this ist stated in part 4.4 method 5

comment:11 Changed 9 years ago by Daniel Poelzleithner

Owner: changed from Fraser Nevett to Daniel Poelzleithner
Patch needs improvement: set
Status: assignednew

Changed 9 years ago by Daniel Poelzleithner

new patch, fixing the issue with pipes and files

comment:12 Changed 9 years ago by Daniel Poelzleithner

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

the new patch doesn't contain http/1.1 and chunked encoding stuff. this should be fixed in a seperate bug

comment:13 Changed 9 years ago by Fraser Nevett

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

If poelzi's new patch is used, then my original gzip.diff patch will need to be updated as get_content_length now raises UnknownSize rather than returning None if the size cannot be determined.

comment:14 Changed 8 years ago by Tai Lee

Also see #7581, related to streaming and the Content-Length header with WSGI.

comment:15 Changed 6 years ago by Thomas Güttler

Cc: hv@… added

comment:16 Changed 6 years ago by Łukasz Rekucki

Type: defectBug

comment:17 Changed 6 years ago by Łukasz Rekucki

Severity: majorNormal

comment:18 Changed 5 years ago by Aymeric Augustin

UI/UX: unset

Change UI/UX from NULL to False.

comment:19 Changed 5 years ago by Aymeric Augustin

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:20 Changed 5 years ago by Mikhail Korobov

Cc: kmike84@… added

comment:21 Changed 4 years ago by Aymeric Augustin

Owner: changed from Daniel Poelzleithner to Aymeric Augustin

comment:22 Changed 4 years ago by Aymeric Augustin

Resolution: duplicate
Status: newclosed

With the current implementation of HttpResponse, when it's instantiated with an iterator, the first access to response.content will return the content, and the second one will return the empty string.

This is the root cause of this issue, and it's tracked in #6527.

Note: See TracTickets for help on using tickets.
Back to Top