Opened 7 years ago

Closed 4 years ago

#13222 closed Bug (fixed)

unexpected HttpResponse iteration behavior

Reported by: teepark Owned by: Aymeric Augustin
Component: HTTP handling Version: 1.3-rc
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

For all the builtin iterables, and most from libraries, the __iter__ method returns something representing one and only one iteration. django.http.HttpResponse.__iter__ sets an iterator attribute on itself, then returns self (it also defines a next method so it is its own iterator). This is bad behavior for an iterable and causes problems when trying to do anything but basic iteration.

A use case (and where I encountered this issue) is in a WSGI server that pulls off the first item in the iterable app return value, prefixes anything that was sent via write calls, and then rebuilds the iterable with itertools.chain:

if prefix:
    body_iterator = iter(body)
    try:
        first_chunk = body_iterator.next()
    except StopIteration:
        first_chunk = ''
    body = itertools.chain((prefix + first_chunk,), body_iterator)

Using django.core.handlers.wsgi.WSGIHandler (which just returns the HttpResponse, since it's iterable) with this server, the first chunk of the HttpResponse gets doubled because itertools.chain calls iter on its arguments, and that resets the HttpResponse's iterator.

To compare with some builtins and other libraries:

>>> it = iter(["foobar"])
>>> list(it), list(it)
(['foobar'], [])
>>> it = iter("foobar" for i in [1])
>>> list(it), list(it)
(['foobar'], [])
>>> import werkzeug
>>> it = iter(werkzeug.Response("foobar").response)
>>> list(it), list(it)
(['foobar'], [])
>>> import webob
>>> it = iter(webob.Response("foobar").app_iter)
>>> list(it), list(it)
(['foobar'], [])
>>> import django.http
>>> it = iter(django.http.HttpResponse("foobar"))
>>> list(it), list(it)
(['foobar'], ['foobar'])

Attachments (3)

httpresponse.diff (1.3 KB) - added by teepark 7 years ago.
just makes HttpResponse.__iter__ a generator, and adds a regression test
httpresponse2.diff (1.7 KB) - added by teepark 6 years ago.
same patch but tests moved from doctest to unittest
13222.patch (1.5 KB) - added by Aymeric Augustin 4 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 7 years ago by teepark

Has patch: set

comment:2 Changed 7 years ago by teepark

Version: 1.1

comment:3 Changed 7 years ago by teepark

Cc: travis.parker@… added

comment:4 Changed 7 years ago by teepark

Cc: travis.parker@… removed

comment:5 Changed 7 years ago by James Bennett

Resolution: duplicate
Status: newclosed

Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.

In the future please search existing tickets before opening new ones.

comment:6 in reply to:  5 Changed 7 years ago by teepark

Resolution: duplicate
Status: closedreopened

Replying to ubernostrum:

Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.

In the future please search existing tickets before opening new ones.

I did search through and I saw that ticket. #7581 deals with HttpResponse.content, which accesses _container directly, so is orthogonal to iteration behavior. My patch won't fix #7581, and fixing #7581 won't help this issue.

Changed 7 years ago by teepark

Attachment: httpresponse.diff added

just makes HttpResponse.__iter__ a generator, and adds a regression test

comment:7 Changed 7 years ago by Russell Keith-Magee

Triage Stage: UnreviewedAccepted

comment:8 Changed 6 years ago by Luke Plant

Type: Bug

comment:9 Changed 6 years ago by Luke Plant

Severity: Normal

comment:10 Changed 6 years ago by Julien Phalip

Patch needs improvement: set

Could you rewrite the tests using unittests since this is now Django's preferred way?

Changed 6 years ago by teepark

Attachment: httpresponse2.diff added

same patch but tests moved from doctest to unittest

comment:11 Changed 6 years ago by teepark

Patch needs improvement: unset

comment:12 Changed 6 years ago by Alex Gaynor

Triage Stage: AcceptedDesign decision needed
Version: 1.3-rc1

I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.

comment:13 in reply to:  12 Changed 6 years ago by teepark

Replying to Alex:

I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.

I'd be happy if HttpResponse emulated file objects' behavior, but that's not quite the case either (the code snippet from the OP works fine with a file).

File objects have a single cursor location that all iterators wind up sharing, but they don't set it back to the beginning of the file every time you call iter() on it, which in effect is what HttpResponse does currently.

IMO the design decision should be whether we fix it to be like list (the current patch) or we fix it to be like files.

comment:14 Changed 6 years ago by grahamd

In some respect this is an issue which could perhaps be brought up on Python WEB-SIG because what you are talking about is how the iterable returned by a WSGI application should behave. I know for anything I have seen and implemented, use of an iterator causes data to be consumed and it wouldn't be rewound to the original beginning of the data if a new iterator were created as Django seems to be doing by what is described here. So, perhaps another odd corner case for WSGI specification which should be clarified.

comment:15 Changed 5 years ago by Aymeric Augustin

UI/UX: unset

Change UI/UX from NULL to False.

comment:16 Changed 5 years ago by Aymeric Augustin

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:17 Changed 4 years ago by Aymeric Augustin

Owner: changed from nobody to Aymeric Augustin
Status: reopenednew
Triage Stage: Design decision neededAccepted

comment:18 Changed 4 years ago by Aymeric Augustin

I'm going to follow Graham's advice: remove the rewinding behavior, ie. behave like files and generators. WSGI doesn't expect to iterate the response more than once.

Another advantage is that this behavior can be implemented consistently in HttpResponse and StreamingHttpResponse

Changed 4 years ago by Aymeric Augustin

Attachment: 13222.patch added

comment:19 Changed 4 years ago by Aymeric Augustin

I think the attached patch is the simplest way to fix the problem.

comment:20 Changed 4 years ago by Aymeric Augustin

Triage Stage: AcceptedReady for checkin

comment:21 Changed 4 years ago by teepark

LGTM, with the assumption that this works for every python version we care about:

>>> l = [1,2,3]
>>> i = iter(l)
>>> l.append(4)
>>> list(i)
[1, 2, 3, 4]

comment:22 Changed 4 years ago by Aymeric Augustin

It does in Python 2.6, 2.7, 3.2 and 3.3.

comment:23 Changed 4 years ago by teepark

@content.setter actually supports any iterable type, and dictionaries and set iterators (and perhaps others) will raise if their original object is modified since the iterator's creation.

>>> s = set([1,2,3])
>>> i = iter(s)
>>> s.add(4)
>>> list(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Set changed size during iteration

There's not really a good reason to use either of those types as HttpResponse.content, but _content becomes the provided object so long as it hasattr("__iter__"), so this implementation does also create new limitations on what is an acceptable content.

It's my general feeling that separate iterators for separate __iter__ invocations is a more robust approach because of this kind of issue, but this patch is far and away better than the current behavior.

comment:24 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

Resolution: fixed
Status: newclosed

In 6a64822bf4632707212314a25a843c862bdb3874:

Fixed #13222 -- Repeated iteration of HttpResponse

Thanks teepark for the report and grahamd for his insights.

comment:25 Changed 4 years ago by Aymeric Augustin

I committed before seeing your last comment.

It doesn't make much sense to instantiate a response with a dictionary or a set: these aren't ordered types.

comment:26 Changed 4 years ago by teepark

>>> import collections
>>> d = collections.deque([1,2,3])
>>> i = iter(d)
>>> d.append(4)
>>> list(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: deque mutated during iteration

it's not about ordered vs unordered, it's about whether pre-existing iterators tolerate mutation.

comment:27 Changed 4 years ago by Aymeric Augustin

Well, if there's a valid use case for that, let's open a new ticket.

Usually one builds the entire content first, then iterates over it to render the response.

comment:28 Changed 4 years ago by Aymeric Augustin

Resolution: fixed
Status: closedreopened

This commit broke pickling of HttpResponse objects :( Possible solution:

--- a/django/http/response.py
+++ b/django/http/response.py
@@ -260,9 +260,10 @@ class HttpResponse(HttpResponseBase):
         else:
             self._container = [value]
             self._base_content_is_iter = False
-        self._iterator = iter(self._container)
 
     def __iter__(self):
+        if not hasattr(self, '_iterator'):
+            self._iterator = iter(self._container)
         return self
 
     def __next__(self):
Last edited 4 years ago by Aymeric Augustin (previous) (diff)

comment:29 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

In ea57112d531dfc6d2be50f47ee89f3a06f8ff1a8:

Reverted 6a64822bf4632707212314a25a843c862bdb3874.

This commit caused every test that does two or more assertContains to
fail, because of #6527. It also made HttpResponse non-pickleable.

Refs #13222.

comment:30 Changed 4 years ago by Aymeric Augustin <aymeric.augustin@…>

Resolution: fixed
Status: reopenedclosed

In 82b3e6ffcb9d810cc0e3ec27d25f89ce1fd525e0:

Fixed #13222 -- Made HttpResponse iterable once

response.content can be accessed many times as desired, and always
returns the same result.

iter(response) works only once and consumes the iterator.

Note: See TracTickets for help on using tickets.
Back to Top