#6527 closed Bug (fixed)
A bug in HttpResponse with iterators
Reported by: | Owned by: | Aymeric Augustin | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | http iterators |
Cc: | mikolo2@…, jim@…, robert.coup@…, real.human@…, Forest Bond, daemianmack@…, net147 | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
>>> from django.http import HttpResponse >>> response = HttpResponse(file('helloworld.txt','r')) >>> print response Content-Type: text/html; charset=utf-8 helloworld >>> print response Content-Type: text/html; charset=utf-8 >>>
Seems to me django.http.HttpResponse has to be modified so that when its init is given an iterator, it will go through the iterator only once.
I'd be happy to deliver a patch if this is something that should be fixed.
Attachments (4)
Change History (40)
comment:1 Changed 16 years ago by
Component: | Uncategorized → HTTP handling |
---|---|
Triage Stage: | Unreviewed → Design decision needed |
comment:2 Changed 16 years ago by
It came up because I downloaded django analytics from google code and added their middleware. What it does is to scan the reponse for the end body tag and insert the analytics tracker code there. I think it's very clean and it worked great.
Unrelated, I have a piece of code that acts as a proxy for the flickr service - I have to keep my key safe so my server has to act as a go between. My view ends with:
flickr_resp = urlopen(flickr_api_url, params) return HttpResponse(''.join(flickr_resp), mimetype='application/json')
and it worked well until I added the middleware.
It's one of the nasties bags I found - The first thing I did was to add "response = ..." and then print it so I can use the local server to debug. The response seemed fine, after a lot of searching and wandering about I decided to print it twice.
It's definitely not the result I expected. When you have a framework that acts like that, it loses my trust.
comment:3 follow-up: 31 Changed 16 years ago by
Triage Stage: | Design decision needed → Accepted |
---|
The HttpResponse class should turn the iterator into a list and store that result. Might as well do it in the __init__
method.
Response classes cannot work reliably with iterators that aren't read in their entirety at the moment because it's too easy (and common) for middleware to access response.content, which would chew up an iterator. You don't have to look back very far in the archives to see how bad things get when we tried to make things truly iterator-aware at that level. It just doesn't integrate well with middleware.
comment:4 Changed 16 years ago by
Owner: | changed from nobody to John DeRosa |
---|---|
Status: | new → assigned |
Taken during PyCon 2008 sprint.
comment:5 Changed 16 years ago by
Has patch: | set |
---|
Fixed in the attached patch file. I added regression tests to cover this case.
Note, svn thinks the entire AUTHORS file changed due to bug #6545, which refers to a line-ending problem with pool text files.
comment:6 Changed 16 years ago by
If you're going to read the contents from the iterator, there's no need to create another attribute on the model. Just put the contents into self._contents
. You don't need to store the iterator.
Changed 16 years ago by
Attachment: | 6527-2.diff added |
---|
Less code here, includes original regression tests
comment:7 Changed 16 years ago by
Roger. There is no self._contents
, so I'm guessing you meant to write self._container
.
The entire class got simpler, as a result. Excellent!
The patch is 6527 V2.patch
.
The guideline, "If an HttpResponse has been initialized with an iterator as its content, you can’t use the HttpResponse instance as a file-like object. Doing so will raise Exception," is no longer required, so its change is now in the patch. Svn thinks every line changed due to bug #6545.
comment:9 follow-up: 10 Changed 16 years ago by
nice job stugots! my patch 6527-2.diff is not as good. Thanks for contributing.
comment:10 Changed 16 years ago by
Replying to daonb <bennydaon@gmail.com>:
nice job stugots! my patch 6527-2.diff is not as good. Thanks for contributing.
Thank you for your kind words. I wish I could do more, but I'm just getting up to speed on the pool. It's thrilling to be able to contribute in a small way to Django's success!
comment:11 Changed 15 years ago by
Cc: | jim@… added |
---|
comment:12 Changed 15 years ago by
Cc: | robert.coup@… added |
---|
comment:13 Changed 15 years ago by
Cc: | real.human@… added |
---|
comment:14 Changed 14 years ago by
comment:15 Changed 14 years ago by
Owner: | changed from John DeRosa to ccahoon |
---|---|
Status: | assigned → new |
comment:16 Changed 14 years ago by
@stugots Hopefully we can get this patch into trunk at the end of SoC2009. I changed the ownership so I don't forget to address it when I am reviewing my changes later on.
comment:17 Changed 14 years ago by
Triage Stage: | Accepted → Fixed on a branch |
---|
Fixed on branches/soc2009/http-wsgi-improvements.
comment:18 Changed 14 years ago by
So if/when this hits trunk, we will lose the benefit of not having the entire iterator (potentially large and time consuming) saved in memory, which could also potentially cause browser timeouts? If HttpResponse will have no real support for iterators, why not simply make HttpResponse require a string argument? Users wanting to pass an iterator can simply consume it then.
comment:19 Changed 14 years ago by
Triage Stage: | Fixed on a branch → Design decision needed |
---|
The suggested fix, and the changes committed to soc2009/http-wsgi-improvements cannot possibly be committed to trunk. This severely breaks previously documented behaviour because, as mrmachine points out, you will no longer be able to iteratively generate large responses, and in particular, will not be able to iteratively deliver chunks of content to the client.
This will break any page that takes a long time to fully generate, or about 10% of one of my sites.
I think that any change will need to consider the different types of middleware that operate on a response's contents.
Some types of middleware will only examine the generated response, eg CacheMiddleware will not modify the response, only store a version in the cache when complete.
Others will want to replace the content and modify headers, eg GzipMiddleware.
The current 'fix' only satisfies the requirements of the modifying middleware.
I fixed the limitation of the cache middleware to cope with iterator based responses with code like so:
def buffer_and_cache_response(response, cache_key, timeout): from copy import copy def worker(rsp): from cStringIO import StringIO buf = StringIO() # we need to copy the response before we iterate through it # if we copy it after, then the response will have a generator object # on self._iterator that will not be picklable buffered_response = copy(rsp) for chunk in rsp: buf.write(chunk) yield chunk buffered_response.content = buf.getvalue() buf.close() rsp.close() cache.set(cache_key, buffered_response, timeout) new_resp = copy(response) new_resp._container = worker(response) new_resp._is_string = False return new_resp
Our cache middleware (virtually copy/paste of UpdateCacheMiddleware) determines if this is an iterator derived response by checking response._is_string (hacky, I know), and returns the rv of this function. This is just an example of a more intelligent, correct way to fix this issue.
comment:20 Changed 14 years ago by
Cc: | Forest Bond added |
---|
comment:21 Changed 13 years ago by
Cc: | daemianmack@… added |
---|
comment:22 Changed 13 years ago by
I ended up using the exact same "response._is_string" check in my patch on #10627.
If the documentation for "Passing Iterators" also mentioned "generators", it would cover one side of the behavior issue.
If the documentation on writing middleware warned of the potential of responses being iterators/generators, and thus to no indirectly consume them, along with mentioning the _is_string [or equivalent 'correct'] test, it would at least cover that base.
comment:23 Changed 13 years ago by
I worked on this three years ago. And it's still not resolved. Amazing.
comment:24 Changed 13 years ago by
Type: | → Bug |
---|
comment:25 Changed 13 years ago by
Severity: | → Normal |
---|
comment:26 Changed 12 years ago by
Cc: | net147 added |
---|---|
Easy pickings: | unset |
UI/UX: | unset |
comment:27 Changed 12 years ago by
Alex and I looked at this at the sprint and thought there had been extensive discussion of iterable responses and middleware during the TemplateResponse discussions, but weren't sure what the conclusions of that discussion were. It doesn't seem that the relevant HttpResponse code has changed.
comment:28 Changed 12 years ago by
Triage Stage: | Design decision needed → Accepted |
---|
This shouldn't be DDN -- it's still a bug -- but it's far from clear how to solve this problem. We've got to figure out a consistant way of handling lazy vs. non-lazy responses, and we haven't yet. That's a bigger discussion and probably needs a "master ticket" or an extended discussion on django-dev.
comment:29 Changed 12 years ago by
Patch needs improvement: | set |
---|
comment:30 Changed 11 years ago by
Owner: | changed from ccahoon to Aymeric Augustin |
---|
comment:31 Changed 11 years ago by
Replying to mtredinnick:
The HttpResponse class should turn the iterator into a list and store that result.
Yes, it should.
Might as well do it in the
__init__
method.
Today, it seems that the consensus it to keep the streaming behavior in the (rare) cases where it works. This means turning the iterator into a list only when it's going to be iterated and the content loaded in memory anyway — typically when middleware reads response.content
.
Response classes cannot work reliably with iterators that aren't read in their entirety at the moment because it's too easy (and common) for middleware to access response.content, which would chew up an iterator. You don't have to look back very far in the archives to see how bad things get when we tried to make things truly iterator-aware at that level. It just doesn't integrate well with middleware.
A separate StreamingHttpResponse
was just introduced for this purpose in #7581. In fact, the last pull request also included the changes required to fix this ticket. I didn't commit them because they weren't strictly necessary for #7581. I've extracted them into a separate patch, which I'll be attaching shortly.
This is a logical follow-up for #7581 and I'll commit it quickly if no one objects.
Changed 11 years ago by
Attachment: | 6527-v3.patch added |
---|
comment:32 follow-up: 33 Changed 11 years ago by
I think that the desire to keep the partial streaming behaviour as-is for HttpResponse
was primarily to minimise any disruption or edge-case backwards incompatibility when introducing explicit support for streaming responses. I think that eventually, the goal here should be a more explicit and consistent HttpResponse
class that makes user intent clear. Did they pass an iterator as content because they want to stream the response, or as a convenience?
I would like to keep the existing behaviour for now and raise a PendingDeprecationWarning
when an iterator is consumed by accessing .content
, that directs people to use the new streaming response class. When the deprecation cycle is finished, HttpResponse
should consume iterators on assignment. This should give people enough time to switch over to the new streaming response class in those cases when they actually want a streaming response.
An HttpResponse
class that always stores its content in a list internally can be pickled (e.g. by cache middleware and possibly other 3rd party code) without first having to access the content, and will remove any ambiguity in situations where a response currently might or might not stream.
comment:33 Changed 11 years ago by
Replying to mrmachine:
I think that the desire to keep the partial streaming behaviour as-is for
HttpResponse
was primarily to minimise any disruption or edge-case backwards incompatibility when introducing explicit support for streaming responses. I think that eventually, the goal here should be a more explicit and consistentHttpResponse
class that makes user intent clear. Did they pass an iterator as content because they want to stream the response, or as a convenience?
Yes, I understand that.
I would like to keep the existing behaviour for now and raise a
PendingDeprecationWarning
when an iterator is consumed by accessing.content
, that directs people to use the new streaming response class. When the deprecation cycle is finished,HttpResponse
should consume iterators on assignment. This should give people enough time to switch over to the new streaming response class in those cases when they actually want a streaming response.
Actually we would have to raise the warning when the iterator is *not* consumed by accessing .content
, but rather by the WSGI server iterating the response.
An
HttpResponse
class that always stores its content in a list internally can be pickled (e.g. by cache middleware and possibly other 3rd party code) without first having to access the content, and will remove any ambiguity in situations where a response currently might or might not stream.
Did you mean "stores its content in a string"?
comment:34 Changed 11 years ago by
No. I meant that when ._container
is a list the response can be pickled. When it is an iterator/generator, it cannot. You are right about raising the warning when an HttpResponse
is implicitly streamed, rather than when a user consumes the iterator content (accidentally or not). That makes more sense.
comment:35 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
I'm not sure that this is an actual bug. One of the benefits in passing an iterator is so that you don't get the whole response saved in memory, isn't it?