Opened 18 years ago

Closed 12 years ago

#2504 closed Bug (duplicate)

ConditionalGetMiddleware causes Content-Length to be 0 on HttpResponse with fileobject

Reported by: django@… Owned by: Aymeric Augustin
Component: Core (Other) Version: dev
Severity: Normal Keywords:
Cc: hv@…, kmike84@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

middleware/http.py:

        if not response.has_header('Content-Length'):
            response['Content-Length'] = str(len(response.content))

If response is build with a file object, str(len(response.content)) is 0, and the browser will ignore the page, if the handler does not set content-length.

If response.content is not a string, do not add Content-Length.
Optional we could implement:
If response.content is a file like object, and content.name is a valid filename, we could get the size with os.path.getsize and those.

Attachments (4)

get_content_length.diff (3.2 KB ) - added by Fraser Nevett 17 years ago.
gzip.diff (660 bytes ) - added by Fraser Nevett 17 years ago.
get_content_length2.diff (4.6 KB ) - added by Fraser Nevett 17 years ago.
Supercedes get_content_length.diff
2504_http_response_and_conditional_middleware.patch (4.2 KB ) - added by Daniel Poelzleithner 17 years ago.
new patch, fixing the issue with pipes and files

Download all attachments as: .zip

Change History (26)

comment:1 by Malcolm Tredinnick, 18 years ago

Owner: changed from Adrian Holovaty to Malcolm Tredinnick

comment:2 by Malcolm Tredinnick, 18 years ago

Some notes about a solution to this problem, since I haven't started working on it and somebody else may wish to:

If we are using HTTP/1.1, we can send the Transfer-Encoding: chunked header and chunk the output into "reasonable size chunks", without needing to worry about a content length ahead of time.

With HTTP/1.0, however, we need a content-length header. For file objects, looking at the file size is not bad. For other iterators, we need to read it all into memory and compute the length.

comment:3 by Simon G. <dev@…>, 18 years ago

Triage Stage: UnreviewedAccepted

comment:4 by anonymous, 17 years ago

Component: Core frameworkInternationalization
Needs documentation: set
Triage Stage: AcceptedDesign decision needed

comment:5 by Simon G. <dev@…>, 17 years ago

Component: InternationalizationCore framework
Owner: changed from Malcolm Tredinnick to Adrian Holovaty
Triage Stage: Design decision neededAccepted

Please don't do anonymous triage, or change things without a reason.

comment:6 by Fraser Nevett, 17 years ago

Owner: changed from nobody to Fraser Nevett
Status: newassigned

by Fraser Nevett, 17 years ago

Attachment: get_content_length.diff added

by Fraser Nevett, 17 years ago

Attachment: gzip.diff added

comment:7 by Fraser Nevett, 17 years ago

Has patch: set
Needs documentation: unset

The Transfer-Encoding heading is a hop-by-hop header, which are not permitted to be set by PEP 333 (WSGI):

applications and middleware must not apply any kind of Transfer-Encoding to their output, such as chunking or gzipping; as "hop-by-hop" operations, these encodings are the province of the actual web server/gateway

There doesn't seem to be any way for a WSGI handler to require or even indicate that chunked encoding should be used: it appears that this remains the purview of the web server. I also checked and couldn't find a way in mod_python either.

The attached patch (including docs and tests) introduces a get_content_length() method to HttpResponse which efficiently determines the content length, if possible.

As Malcolm pointed out to me on IRC, it would be sensible to use response.get_content_length() instead of len(response.content throughout. I checked and the only instance I could see of this was in GZipMiddleware, so I've attached a patch to update that too.

comment:8 by Fraser Nevett, 17 years ago

Better patch now attached which should handle all types of iterators passed in as content, not just files, lists and tuples.

Note the change in _get_content that stores the joined string. This was done because once an iterator has been iterated through, it is effectively spent and cannot be iterated through again. It may also be slightly optimal as it means it won't need to join as many values in subsequent calls.

by Fraser Nevett, 17 years ago

Attachment: get_content_length2.diff added

Supercedes get_content_length.diff

comment:9 by arne, 17 years ago

I create tar files on the fly with os.popen2() and pass stdout, which is a fileobj, as response.content, doing a len(response.content) reads the full tar file into memory, which can be > 500MB in my case. Without checking for the content length everything works fine and I can do a "streaming" download to the browser ...

As a workaroung I have patched my django tree and added the following two lines at the top of process_response() in the ConditionalGetMiddleware:

if(response._is_string == False):
    return response


This may not be fully HTTP compatible, but I see no other way of sending large files without creating temp-files ...

comment:10 by arne, 17 years ago

as a side note and as reply @ 09/27/06 06:57:46 changed by mtredinnick:

Both, RFC 1945 HTTP/1.0 and RFC 2626 HTTP/1.1, state that an absent Content-Length Header is allowed. If the header is missing and no special Transfer-Encoding (e.g. chunked) is set, closing the connection on the server side determines the end of the response content.

In RFC 1945 this is stated in part 7.2.2 and 10.4,
In RFC 2616 this ist stated in part 4.4 method 5

comment:11 by Daniel Poelzleithner, 17 years ago

Owner: changed from Fraser Nevett to Daniel Poelzleithner
Patch needs improvement: set
Status: assignednew

by Daniel Poelzleithner, 17 years ago

new patch, fixing the issue with pipes and files

comment:12 by Daniel Poelzleithner, 17 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

the new patch doesn't contain http/1.1 and chunked encoding stuff. this should be fixed in a seperate bug

comment:13 by Fraser Nevett, 17 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

If poelzi's new patch is used, then my original gzip.diff patch will need to be updated as get_content_length now raises UnknownSize rather than returning None if the size cannot be determined.

comment:14 by Tai Lee, 16 years ago

Also see #7581, related to streaming and the Content-Length header with WSGI.

comment:15 by Thomas Güttler, 14 years ago

Cc: hv@… added

comment:16 by Łukasz Rekucki, 14 years ago

Type: defectBug

comment:17 by Łukasz Rekucki, 14 years ago

Severity: majorNormal

comment:18 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:19 by Aymeric Augustin, 13 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:20 by Mikhail Korobov, 13 years ago

Cc: kmike84@… added

comment:21 by Aymeric Augustin, 12 years ago

Owner: changed from Daniel Poelzleithner to Aymeric Augustin

comment:22 by Aymeric Augustin, 12 years ago

Resolution: duplicate
Status: newclosed

With the current implementation of HttpResponse, when it's instantiated with an iterator, the first access to response.content will return the content, and the second one will return the empty string.

This is the root cause of this issue, and it's tracked in #6527.

Note: See TracTickets for help on using tickets.
Back to Top