Opened 15 years ago

Closed 12 years ago

#13222 closed Bug (fixed)

unexpected HttpResponse iteration behavior

Reported by: teepark Owned by: Aymeric Augustin
Component: HTTP handling Version: 1.3-rc
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

For all the builtin iterables, and most from libraries, the __iter__ method returns something representing one and only one iteration. django.http.HttpResponse.__iter__ sets an iterator attribute on itself, then returns self (it also defines a next method so it is its own iterator). This is bad behavior for an iterable and causes problems when trying to do anything but basic iteration.

A use case (and where I encountered this issue) is in a WSGI server that pulls off the first item in the iterable app return value, prefixes anything that was sent via write calls, and then rebuilds the iterable with itertools.chain:

if prefix:
    body_iterator = iter(body)
    try:
        first_chunk = body_iterator.next()
    except StopIteration:
        first_chunk = ''
    body = itertools.chain((prefix + first_chunk,), body_iterator)

Using django.core.handlers.wsgi.WSGIHandler (which just returns the HttpResponse, since it's iterable) with this server, the first chunk of the HttpResponse gets doubled because itertools.chain calls iter on its arguments, and that resets the HttpResponse's iterator.

To compare with some builtins and other libraries:

>>> it = iter(["foobar"])
>>> list(it), list(it)
(['foobar'], [])
>>> it = iter("foobar" for i in [1])
>>> list(it), list(it)
(['foobar'], [])
>>> import werkzeug
>>> it = iter(werkzeug.Response("foobar").response)
>>> list(it), list(it)
(['foobar'], [])
>>> import webob
>>> it = iter(webob.Response("foobar").app_iter)
>>> list(it), list(it)
(['foobar'], [])
>>> import django.http
>>> it = iter(django.http.HttpResponse("foobar"))
>>> list(it), list(it)
(['foobar'], ['foobar'])

Attachments (3)

httpresponse.diff (1.3 KB ) - added by teepark 15 years ago.
just makes HttpResponse.__iter__ a generator, and adds a regression test
httpresponse2.diff (1.7 KB ) - added by teepark 14 years ago.
same patch but tests moved from doctest to unittest
13222.patch (1.5 KB ) - added by Aymeric Augustin 12 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 by teepark, 15 years ago

Has patch: set

comment:2 by teepark, 15 years ago

Version: 1.1

comment:3 by teepark, 15 years ago

Cc: travis.parker@… added

comment:4 by teepark, 15 years ago

Cc: travis.parker@… removed

comment:5 by James Bennett, 15 years ago

Resolution: duplicate
Status: newclosed

Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.

In the future please search existing tickets before opening new ones.

in reply to:  5 comment:6 by teepark, 15 years ago

Resolution: duplicate
Status: closedreopened

Replying to ubernostrum:

Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.

In the future please search existing tickets before opening new ones.

I did search through and I saw that ticket. #7581 deals with HttpResponse.content, which accesses _container directly, so is orthogonal to iteration behavior. My patch won't fix #7581, and fixing #7581 won't help this issue.

by teepark, 15 years ago

Attachment: httpresponse.diff added

just makes HttpResponse.__iter__ a generator, and adds a regression test

comment:7 by Russell Keith-Magee, 15 years ago

Triage Stage: UnreviewedAccepted

comment:8 by Luke Plant, 14 years ago

Type: Bug

comment:9 by Luke Plant, 14 years ago

Severity: Normal

comment:10 by Julien Phalip, 14 years ago

Patch needs improvement: set

Could you rewrite the tests using unittests since this is now Django's preferred way?

by teepark, 14 years ago

Attachment: httpresponse2.diff added

same patch but tests moved from doctest to unittest

comment:11 by teepark, 14 years ago

Patch needs improvement: unset

comment:12 by Alex Gaynor, 14 years ago

Triage Stage: AcceptedDesign decision needed
Version: 1.3-rc1

I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.

in reply to:  12 comment:13 by teepark, 14 years ago

Replying to Alex:

I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.

I'd be happy if HttpResponse emulated file objects' behavior, but that's not quite the case either (the code snippet from the OP works fine with a file).

File objects have a single cursor location that all iterators wind up sharing, but they don't set it back to the beginning of the file every time you call iter() on it, which in effect is what HttpResponse does currently.

IMO the design decision should be whether we fix it to be like list (the current patch) or we fix it to be like files.

comment:14 by Graham Dumpleton, 14 years ago

In some respect this is an issue which could perhaps be brought up on Python WEB-SIG because what you are talking about is how the iterable returned by a WSGI application should behave. I know for anything I have seen and implemented, use of an iterator causes data to be consumed and it wouldn't be rewound to the original beginning of the data if a new iterator were created as Django seems to be doing by what is described here. So, perhaps another odd corner case for WSGI specification which should be clarified.

comment:15 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:16 by Aymeric Augustin, 13 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:17 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin
Status: reopenednew
Triage Stage: Design decision neededAccepted

comment:18 by Aymeric Augustin, 12 years ago

I'm going to follow Graham's advice: remove the rewinding behavior, ie. behave like files and generators. WSGI doesn't expect to iterate the response more than once.

Another advantage is that this behavior can be implemented consistently in HttpResponse and StreamingHttpResponse

by Aymeric Augustin, 12 years ago

Attachment: 13222.patch added

comment:19 by Aymeric Augustin, 12 years ago

I think the attached patch is the simplest way to fix the problem.

comment:20 by Aymeric Augustin, 12 years ago

Triage Stage: AcceptedReady for checkin

comment:21 by teepark, 12 years ago

LGTM, with the assumption that this works for every python version we care about:

>>> l = [1,2,3]
>>> i = iter(l)
>>> l.append(4)
>>> list(i)
[1, 2, 3, 4]

comment:22 by Aymeric Augustin, 12 years ago

It does in Python 2.6, 2.7, 3.2 and 3.3.

comment:23 by teepark, 12 years ago

@content.setter actually supports any iterable type, and dictionaries and set iterators (and perhaps others) will raise if their original object is modified since the iterator's creation.

>>> s = set([1,2,3])
>>> i = iter(s)
>>> s.add(4)
>>> list(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Set changed size during iteration

There's not really a good reason to use either of those types as HttpResponse.content, but _content becomes the provided object so long as it hasattr("__iter__"), so this implementation does also create new limitations on what is an acceptable content.

It's my general feeling that separate iterators for separate __iter__ invocations is a more robust approach because of this kind of issue, but this patch is far and away better than the current behavior.

comment:24 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

Resolution: fixed
Status: newclosed

In 6a64822bf4632707212314a25a843c862bdb3874:

Fixed #13222 -- Repeated iteration of HttpResponse

Thanks teepark for the report and grahamd for his insights.

comment:25 by Aymeric Augustin, 12 years ago

I committed before seeing your last comment.

It doesn't make much sense to instantiate a response with a dictionary or a set: these aren't ordered types.

comment:26 by teepark, 12 years ago

>>> import collections
>>> d = collections.deque([1,2,3])
>>> i = iter(d)
>>> d.append(4)
>>> list(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: deque mutated during iteration

it's not about ordered vs unordered, it's about whether pre-existing iterators tolerate mutation.

comment:27 by Aymeric Augustin, 12 years ago

Well, if there's a valid use case for that, let's open a new ticket.

Usually one builds the entire content first, then iterates over it to render the response.

comment:28 by Aymeric Augustin, 12 years ago

Resolution: fixed
Status: closedreopened

This commit broke pickling of HttpResponse objects :( Possible solution:

--- a/django/http/response.py
+++ b/django/http/response.py
@@ -260,9 +260,10 @@ class HttpResponse(HttpResponseBase):
         else:
             self._container = [value]
             self._base_content_is_iter = False
-        self._iterator = iter(self._container)
 
     def __iter__(self):
+        if not hasattr(self, '_iterator'):
+            self._iterator = iter(self._container)
         return self
 
     def __next__(self):
Last edited 12 years ago by Aymeric Augustin (previous) (diff)

comment:29 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

In ea57112d531dfc6d2be50f47ee89f3a06f8ff1a8:

Reverted 6a64822bf4632707212314a25a843c862bdb3874.

This commit caused every test that does two or more assertContains to
fail, because of #6527. It also made HttpResponse non-pickleable.

Refs #13222.

comment:30 by Aymeric Augustin <aymeric.augustin@…>, 12 years ago

Resolution: fixed
Status: reopenedclosed

In 82b3e6ffcb9d810cc0e3ec27d25f89ce1fd525e0:

Fixed #13222 -- Made HttpResponse iterable once

response.content can be accessed many times as desired, and always
returns the same result.

iter(response) works only once and consumes the iterator.

Note: See TracTickets for help on using tickets.
Back to Top