#6527 closed Bug (fixed)
A bug in HttpResponse with iterators
| Reported by: | Owned by: | Aymeric Augustin | |
|---|---|---|---|
| Component: | HTTP handling | Version: | dev |
| Severity: | Normal | Keywords: | http iterators |
| Cc: | mikolo2@…, jim@…, robert.coup@…, real.human@…, Forest Bond, daemianmack@…, net147 | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description
>>> from django.http import HttpResponse
>>> response = HttpResponse(file('helloworld.txt','r'))
>>> print response
Content-Type: text/html; charset=utf-8
helloworld
>>> print response
Content-Type: text/html; charset=utf-8
>>>
Seems to me django.http.HttpResponse has to be modified so that when its init is given an iterator, it will go through the iterator only once.
I'd be happy to deliver a patch if this is something that should be fixed.
Attachments (4)
Change History (40)
comment:1 by , 18 years ago
| Component: | Uncategorized → HTTP handling |
|---|---|
| Triage Stage: | Unreviewed → Design decision needed |
comment:2 by , 18 years ago
It came up because I downloaded django analytics from google code and added their middleware. What it does is to scan the reponse for the end body tag and insert the analytics tracker code there. I think it's very clean and it worked great.
Unrelated, I have a piece of code that acts as a proxy for the flickr service - I have to keep my key safe so my server has to act as a go between. My view ends with:
flickr_resp = urlopen(flickr_api_url, params)
return HttpResponse(''.join(flickr_resp), mimetype='application/json')
and it worked well until I added the middleware.
It's one of the nasties bags I found - The first thing I did was to add "response = ..." and then print it so I can use the local server to debug. The response seemed fine, after a lot of searching and wandering about I decided to print it twice.
It's definitely not the result I expected. When you have a framework that acts like that, it loses my trust.
follow-up: 31 comment:3 by , 18 years ago
| Triage Stage: | Design decision needed → Accepted |
|---|
The HttpResponse class should turn the iterator into a list and store that result. Might as well do it in the __init__ method.
Response classes cannot work reliably with iterators that aren't read in their entirety at the moment because it's too easy (and common) for middleware to access response.content, which would chew up an iterator. You don't have to look back very far in the archives to see how bad things get when we tried to make things truly iterator-aware at that level. It just doesn't integrate well with middleware.
comment:4 by , 18 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
Taken during PyCon 2008 sprint.
comment:5 by , 18 years ago
| Has patch: | set |
|---|
Fixed in the attached patch file. I added regression tests to cover this case.
Note, svn thinks the entire AUTHORS file changed due to bug #6545, which refers to a line-ending problem with pool text files.
comment:6 by , 18 years ago
If you're going to read the contents from the iterator, there's no need to create another attribute on the model. Just put the contents into self._contents. You don't need to store the iterator.
comment:7 by , 18 years ago
Roger. There is no self._contents, so I'm guessing you meant to write self._container.
The entire class got simpler, as a result. Excellent!
The patch is 6527 V2.patch.
The guideline, "If an HttpResponse has been initialized with an iterator as its content, you can’t use the HttpResponse instance as a file-like object. Doing so will raise Exception," is no longer required, so its change is now in the patch. Svn thinks every line changed due to bug #6545.
follow-up: 10 comment:9 by , 18 years ago
nice job stugots! my patch 6527-2.diff is not as good. Thanks for contributing.
comment:10 by , 18 years ago
Replying to daonb <bennydaon@gmail.com>:
nice job stugots! my patch 6527-2.diff is not as good. Thanks for contributing.
Thank you for your kind words. I wish I could do more, but I'm just getting up to speed on the pool. It's thrilling to be able to contribute in a small way to Django's success!
comment:11 by , 17 years ago
| Cc: | added |
|---|
comment:12 by , 17 years ago
| Cc: | added |
|---|
comment:13 by , 17 years ago
| Cc: | added |
|---|
comment:14 by , 16 years ago
comment:15 by , 16 years ago
| Owner: | changed from to |
|---|---|
| Status: | assigned → new |
comment:16 by , 16 years ago
@stugots Hopefully we can get this patch into trunk at the end of SoC2009. I changed the ownership so I don't forget to address it when I am reviewing my changes later on.
comment:17 by , 16 years ago
| Triage Stage: | Accepted → Fixed on a branch |
|---|
Fixed on branches/soc2009/http-wsgi-improvements.
comment:18 by , 16 years ago
So if/when this hits trunk, we will lose the benefit of not having the entire iterator (potentially large and time consuming) saved in memory, which could also potentially cause browser timeouts? If HttpResponse will have no real support for iterators, why not simply make HttpResponse require a string argument? Users wanting to pass an iterator can simply consume it then.
comment:19 by , 16 years ago
| Triage Stage: | Fixed on a branch → Design decision needed |
|---|
The suggested fix, and the changes committed to soc2009/http-wsgi-improvements cannot possibly be committed to trunk. This severely breaks previously documented behaviour because, as mrmachine points out, you will no longer be able to iteratively generate large responses, and in particular, will not be able to iteratively deliver chunks of content to the client.
This will break any page that takes a long time to fully generate, or about 10% of one of my sites.
I think that any change will need to consider the different types of middleware that operate on a response's contents.
Some types of middleware will only examine the generated response, eg CacheMiddleware will not modify the response, only store a version in the cache when complete.
Others will want to replace the content and modify headers, eg GzipMiddleware.
The current 'fix' only satisfies the requirements of the modifying middleware.
I fixed the limitation of the cache middleware to cope with iterator based responses with code like so:
def buffer_and_cache_response(response, cache_key, timeout):
from copy import copy
def worker(rsp):
from cStringIO import StringIO
buf = StringIO()
# we need to copy the response before we iterate through it
# if we copy it after, then the response will have a generator object
# on self._iterator that will not be picklable
buffered_response = copy(rsp)
for chunk in rsp:
buf.write(chunk)
yield chunk
buffered_response.content = buf.getvalue()
buf.close()
rsp.close()
cache.set(cache_key, buffered_response, timeout)
new_resp = copy(response)
new_resp._container = worker(response)
new_resp._is_string = False
return new_resp
Our cache middleware (virtually copy/paste of UpdateCacheMiddleware) determines if this is an iterator derived response by checking response._is_string (hacky, I know), and returns the rv of this function. This is just an example of a more intelligent, correct way to fix this issue.
comment:20 by , 16 years ago
| Cc: | added |
|---|
comment:21 by , 15 years ago
| Cc: | added |
|---|
comment:22 by , 15 years ago
I ended up using the exact same "response._is_string" check in my patch on #10627.
If the documentation for "Passing Iterators" also mentioned "generators", it would cover one side of the behavior issue.
If the documentation on writing middleware warned of the potential of responses being iterators/generators, and thus to no indirectly consume them, along with mentioning the _is_string [or equivalent 'correct'] test, it would at least cover that base.
comment:23 by , 15 years ago
I worked on this three years ago. And it's still not resolved. Amazing.
comment:24 by , 15 years ago
| Type: | → Bug |
|---|
comment:25 by , 14 years ago
| Severity: | → Normal |
|---|
comment:26 by , 14 years ago
| Cc: | added |
|---|---|
| Easy pickings: | unset |
| UI/UX: | unset |
comment:27 by , 14 years ago
Alex and I looked at this at the sprint and thought there had been extensive discussion of iterable responses and middleware during the TemplateResponse discussions, but weren't sure what the conclusions of that discussion were. It doesn't seem that the relevant HttpResponse code has changed.
comment:28 by , 14 years ago
| Triage Stage: | Design decision needed → Accepted |
|---|
This shouldn't be DDN -- it's still a bug -- but it's far from clear how to solve this problem. We've got to figure out a consistant way of handling lazy vs. non-lazy responses, and we haven't yet. That's a bigger discussion and probably needs a "master ticket" or an extended discussion on django-dev.
comment:29 by , 14 years ago
| Patch needs improvement: | set |
|---|
comment:30 by , 13 years ago
| Owner: | changed from to |
|---|
comment:31 by , 13 years ago
Replying to mtredinnick:
The HttpResponse class should turn the iterator into a list and store that result.
Yes, it should.
Might as well do it in the
__init__method.
Today, it seems that the consensus it to keep the streaming behavior in the (rare) cases where it works. This means turning the iterator into a list only when it's going to be iterated and the content loaded in memory anyway — typically when middleware reads response.content.
Response classes cannot work reliably with iterators that aren't read in their entirety at the moment because it's too easy (and common) for middleware to access response.content, which would chew up an iterator. You don't have to look back very far in the archives to see how bad things get when we tried to make things truly iterator-aware at that level. It just doesn't integrate well with middleware.
A separate StreamingHttpResponse was just introduced for this purpose in #7581. In fact, the last pull request also included the changes required to fix this ticket. I didn't commit them because they weren't strictly necessary for #7581. I've extracted them into a separate patch, which I'll be attaching shortly.
This is a logical follow-up for #7581 and I'll commit it quickly if no one objects.
by , 13 years ago
| Attachment: | 6527-v3.patch added |
|---|
follow-up: 33 comment:32 by , 13 years ago
I think that the desire to keep the partial streaming behaviour as-is for HttpResponse was primarily to minimise any disruption or edge-case backwards incompatibility when introducing explicit support for streaming responses. I think that eventually, the goal here should be a more explicit and consistent HttpResponse class that makes user intent clear. Did they pass an iterator as content because they want to stream the response, or as a convenience?
I would like to keep the existing behaviour for now and raise a PendingDeprecationWarning when an iterator is consumed by accessing .content, that directs people to use the new streaming response class. When the deprecation cycle is finished, HttpResponse should consume iterators on assignment. This should give people enough time to switch over to the new streaming response class in those cases when they actually want a streaming response.
An HttpResponse class that always stores its content in a list internally can be pickled (e.g. by cache middleware and possibly other 3rd party code) without first having to access the content, and will remove any ambiguity in situations where a response currently might or might not stream.
comment:33 by , 13 years ago
Replying to mrmachine:
I think that the desire to keep the partial streaming behaviour as-is for
HttpResponsewas primarily to minimise any disruption or edge-case backwards incompatibility when introducing explicit support for streaming responses. I think that eventually, the goal here should be a more explicit and consistentHttpResponseclass that makes user intent clear. Did they pass an iterator as content because they want to stream the response, or as a convenience?
Yes, I understand that.
I would like to keep the existing behaviour for now and raise a
PendingDeprecationWarningwhen an iterator is consumed by accessing.content, that directs people to use the new streaming response class. When the deprecation cycle is finished,HttpResponseshould consume iterators on assignment. This should give people enough time to switch over to the new streaming response class in those cases when they actually want a streaming response.
Actually we would have to raise the warning when the iterator is *not* consumed by accessing .content, but rather by the WSGI server iterating the response.
An
HttpResponseclass that always stores its content in a list internally can be pickled (e.g. by cache middleware and possibly other 3rd party code) without first having to access the content, and will remove any ambiguity in situations where a response currently might or might not stream.
Did you mean "stores its content in a string"?
comment:34 by , 13 years ago
No. I meant that when ._container is a list the response can be pickled. When it is an iterator/generator, it cannot. You are right about raising the warning when an HttpResponse is implicitly streamed, rather than when a user consumes the iterator content (accidentally or not). That makes more sense.
comment:35 by , 13 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
I'm not sure that this is an actual bug. One of the benefits in passing an iterator is so that you don't get the whole response saved in memory, isn't it?