Opened 5 years ago

Closed 3 years ago

#13222 closed Bug (fixed)

unexpected HttpResponse iteration behavior

Reported by: teepark Owned by: aaugustin
Component: HTTP handling Version: 1.3-rc
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

For all the builtin iterables, and most from libraries, the __iter__ method returns something representing one and only one iteration. django.http.HttpResponse.__iter__ sets an iterator attribute on itself, then returns self (it also defines a next method so it is its own iterator). This is bad behavior for an iterable and causes problems when trying to do anything but basic iteration.

A use case (and where I encountered this issue) is in a WSGI server that pulls off the first item in the iterable app return value, prefixes anything that was sent via write calls, and then rebuilds the iterable with itertools.chain:

if prefix:
    body_iterator = iter(body)
    try:
        first_chunk = body_iterator.next()
    except StopIteration:
        first_chunk = ''
    body = itertools.chain((prefix + first_chunk,), body_iterator)

Using django.core.handlers.wsgi.WSGIHandler (which just returns the HttpResponse, since it's iterable) with this server, the first chunk of the HttpResponse gets doubled because itertools.chain calls iter on its arguments, and that resets the HttpResponse's iterator.

To compare with some builtins and other libraries:

>>> it = iter(["foobar"])
>>> list(it), list(it)
(['foobar'], [])
>>> it = iter("foobar" for i in [1])
>>> list(it), list(it)
(['foobar'], [])
>>> import werkzeug
>>> it = iter(werkzeug.Response("foobar").response)
>>> list(it), list(it)
(['foobar'], [])
>>> import webob
>>> it = iter(webob.Response("foobar").app_iter)
>>> list(it), list(it)
(['foobar'], [])
>>> import django.http
>>> it = iter(django.http.HttpResponse("foobar"))
>>> list(it), list(it)
(['foobar'], ['foobar'])

Attachments (3)

httpresponse.diff (1.3 KB) - added by teepark 5 years ago.
just makes HttpResponse.__iter__ a generator, and adds a regression test
httpresponse2.diff (1.7 KB) - added by teepark 4 years ago.
same patch but tests moved from doctest to unittest
13222.patch (1.5 KB) - added by aaugustin 3 years ago.

Download all attachments as: .zip

Change History (33)

comment:1 Changed 5 years ago by teepark

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 5 years ago by teepark

  • Version 1.1 deleted

comment:3 Changed 5 years ago by teepark

  • Cc travis.parker@… added

comment:4 Changed 5 years ago by teepark

  • Cc travis.parker@… removed

comment:5 follow-up: Changed 5 years ago by ubernostrum

  • Resolution set to duplicate
  • Status changed from new to closed

Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.

In the future please search existing tickets before opening new ones.

comment:6 in reply to: ↑ 5 Changed 5 years ago by teepark

  • Resolution duplicate deleted
  • Status changed from closed to reopened

Replying to ubernostrum:

Duplicate of #7581, which is already open for dealing with issues related to iterators/generators in response content.

In the future please search existing tickets before opening new ones.

I did search through and I saw that ticket. #7581 deals with HttpResponse.content, which accesses _container directly, so is orthogonal to iteration behavior. My patch won't fix #7581, and fixing #7581 won't help this issue.

Changed 5 years ago by teepark

just makes HttpResponse.__iter__ a generator, and adds a regression test

comment:7 Changed 5 years ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

comment:8 Changed 4 years ago by lukeplant

  • Type set to Bug

comment:9 Changed 4 years ago by lukeplant

  • Severity set to Normal

comment:10 Changed 4 years ago by julien

  • Patch needs improvement set

Could you rewrite the tests using unittests since this is now Django's preferred way?

Changed 4 years ago by teepark

same patch but tests moved from doctest to unittest

comment:11 Changed 4 years ago by teepark

  • Patch needs improvement unset

comment:12 follow-up: Changed 4 years ago by Alex

  • Triage Stage changed from Accepted to Design decision needed
  • Version set to 1.3-rc1

I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.

comment:13 in reply to: ↑ 12 Changed 4 years ago by teepark

Replying to Alex:

I'm not sure we *want* to fix this. Almost all file-like objects have this behavior. Needs more thought IMO.

I'd be happy if HttpResponse emulated file objects' behavior, but that's not quite the case either (the code snippet from the OP works fine with a file).

File objects have a single cursor location that all iterators wind up sharing, but they don't set it back to the beginning of the file every time you call iter() on it, which in effect is what HttpResponse does currently.

IMO the design decision should be whether we fix it to be like list (the current patch) or we fix it to be like files.

comment:14 Changed 4 years ago by grahamd

In some respect this is an issue which could perhaps be brought up on Python WEB-SIG because what you are talking about is how the iterable returned by a WSGI application should behave. I know for anything I have seen and implemented, use of an iterator causes data to be consumed and it wouldn't be rewound to the original beginning of the data if a new iterator were created as Django seems to be doing by what is described here. So, perhaps another odd corner case for WSGI specification which should be clarified.

comment:15 Changed 4 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:16 Changed 4 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:17 Changed 3 years ago by aaugustin

  • Owner changed from nobody to aaugustin
  • Status changed from reopened to new
  • Triage Stage changed from Design decision needed to Accepted

comment:18 Changed 3 years ago by aaugustin

I'm going to follow Graham's advice: remove the rewinding behavior, ie. behave like files and generators. WSGI doesn't expect to iterate the response more than once.

Another advantage is that this behavior can be implemented consistently in HttpResponse and StreamingHttpResponse

Changed 3 years ago by aaugustin

comment:19 Changed 3 years ago by aaugustin

I think the attached patch is the simplest way to fix the problem.

comment:20 Changed 3 years ago by aaugustin

  • Triage Stage changed from Accepted to Ready for checkin

comment:21 Changed 3 years ago by teepark

LGTM, with the assumption that this works for every python version we care about:

>>> l = [1,2,3]
>>> i = iter(l)
>>> l.append(4)
>>> list(i)
[1, 2, 3, 4]

comment:22 Changed 3 years ago by aaugustin

It does in Python 2.6, 2.7, 3.2 and 3.3.

comment:23 Changed 3 years ago by teepark

@content.setter actually supports any iterable type, and dictionaries and set iterators (and perhaps others) will raise if their original object is modified since the iterator's creation.

>>> s = set([1,2,3])
>>> i = iter(s)
>>> s.add(4)
>>> list(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Set changed size during iteration

There's not really a good reason to use either of those types as HttpResponse.content, but _content becomes the provided object so long as it hasattr("__iter__"), so this implementation does also create new limitations on what is an acceptable content.

It's my general feeling that separate iterators for separate __iter__ invocations is a more robust approach because of this kind of issue, but this patch is far and away better than the current behavior.

comment:24 Changed 3 years ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 6a64822bf4632707212314a25a843c862bdb3874:

Fixed #13222 -- Repeated iteration of HttpResponse

Thanks teepark for the report and grahamd for his insights.

comment:25 Changed 3 years ago by aaugustin

I committed before seeing your last comment.

It doesn't make much sense to instantiate a response with a dictionary or a set: these aren't ordered types.

comment:26 Changed 3 years ago by teepark

>>> import collections
>>> d = collections.deque([1,2,3])
>>> i = iter(d)
>>> d.append(4)
>>> list(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: deque mutated during iteration

it's not about ordered vs unordered, it's about whether pre-existing iterators tolerate mutation.

comment:27 Changed 3 years ago by aaugustin

Well, if there's a valid use case for that, let's open a new ticket.

Usually one builds the entire content first, then iterates over it to render the response.

comment:28 Changed 3 years ago by aaugustin

  • Resolution fixed deleted
  • Status changed from closed to reopened

This commit broke pickling of HttpResponse objects :( Possible solution:

--- a/django/http/response.py
+++ b/django/http/response.py
@@ -260,9 +260,10 @@ class HttpResponse(HttpResponseBase):
         else:
             self._container = [value]
             self._base_content_is_iter = False
-        self._iterator = iter(self._container)
 
     def __iter__(self):
+        if not hasattr(self, '_iterator'):
+            self._iterator = iter(self._container)
         return self
 
     def __next__(self):
Last edited 3 years ago by aaugustin (previous) (diff)

comment:29 Changed 3 years ago by Aymeric Augustin <aymeric.augustin@…>

In ea57112d531dfc6d2be50f47ee89f3a06f8ff1a8:

Reverted 6a64822bf4632707212314a25a843c862bdb3874.

This commit caused every test that does two or more assertContains to
fail, because of #6527. It also made HttpResponse non-pickleable.

Refs #13222.

comment:30 Changed 3 years ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from reopened to closed

In 82b3e6ffcb9d810cc0e3ec27d25f89ce1fd525e0:

Fixed #13222 -- Made HttpResponse iterable once

response.content can be accessed many times as desired, and always
returns the same result.

iter(response) works only once and consumes the iterator.

Note: See TracTickets for help on using tickets.
Back to Top