Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#16494 closed Bug (fixed)

HttpResponse should raise an error if given a non-string object without `__iter__`

Reported by: matt@… Owned by: PaulM
Component: HTTP handling Version: 1.3
Severity: Normal Keywords:
Cc: anssi.kaariainen@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

where response =

<class 'django.http.HttpResponse'>

and response._container =

[<Response><Speak loop="1" voice="slt">Hello Friend</Speak></Response>]

Content type = text/xml

matt@dragoon:~/Projects/my_project$ python manage.py test my_app
Creating test database for alias 'default'...
======================================================================
ERROR: test_hello (my_app.tests.RestTestCase)
----------------------------------------------------------------------
print response
Traceback (most recent call last):
  File "/Users/matt/Projects/my_project/my_app/tests.py", line 22, in test_hello
    print result.content
  File "/Library/Python/2.6/site-packages/django/http/__init__.py", line 596, in _get_content
    return smart_str(''.join(self._container), self._charset)
TypeError: sequence item 0: expected string, Response found
----------------------------------------------------------------------
Ran 1 test in 3.177s

FAILED (errors=1)
Destroying test database for alias 'default'...

The following patch corrected the issue for me

django/http/__init__.py
593,595c593
<     def _get_content(self):        if self.has_header('Content-Encoding'):
<             return ''.join(self._container)
<         return smart_str(''.join(self._container), self._charset)
---
>     def _get_content(self):

Attachments (1)

httpresponse_content.diff (6.2 KB) - added by PaulM 3 years ago.
Converts non-string inputs to strings just before output. Adds tests.

Download all attachments as: .zip

Change History (15)

comment:1 Changed 3 years ago by anonymous

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Type changed from Uncategorized to Bug

comment:2 Changed 3 years ago by PaulM

  • Resolution set to needsinfo
  • Status changed from new to closed

There's not enough information here to reproduce your error. What does your testcase look like? What does the rest of the project look like?

If you can provide enough information for someone else to reproduce your issue, please feel free to re-open the ticket.

comment:3 Changed 3 years ago by matt@…

  • Resolution needsinfo deleted
  • Status changed from closed to reopened

I printed out the type of each object in the response's and realized I am getting the error because I was returning an object in my view whose unicode method returns a string containing XML instead of returning an HttpResponse. I think one of two resolutions could help people making this mistake here:

  • Raise a TypeError if a view returns an object that does not behave like or does not subclass an HttpResponse
  • Use the patch I included in the original description to attempt to convert each object in the _container list to a string before joining them

comment:4 Changed 3 years ago by matt@…

Pardon me. I mean it is currently returning an HttpResponse with a non-string as the first argument, e.g.

class Response(object):
    def __unicode__(self):
        return '<Response><Speak loop="1" voice="slt">Hello Friend</Speak></Response>'

return HttpResponse(Response())

comment:5 Changed 3 years ago by PaulM

  • Easy pickings unset
  • Has patch unset
  • Summary changed from django.http.HttpResponse __unicode__ raises TypeError during test to HttpResponse should raise an error if given a non-string object without `__iter__`
  • Triage Stage changed from Unreviewed to Accepted

The relevant code is here: https://code.djangoproject.com/browser/django/trunk/django/http/__init__.py#L546

Django currently assumes that everything that is an instance of the base string or has no __iter__ property is a string.

We should either be converting to a string on line 550, or raising an error if that object is not a string. I'm in favor of doing the string conversion.

This looks like a legitimate bug to me, so I'm marking it as such and changing the title to reflect the real issue.

comment:6 Changed 3 years ago by anonymous

I just ran into this today. I was returning the pk of an object in the HttpResponse object like so:

return HttpResponse(obj.pk)

This caused the same error. One fix that I found floating around the internet was changing this:

def _get_content(self):
        if self.has_header('Content-Encoding'):
            return ''.join(self._container)
        return smart_str(''.join(self._container), self._charset)

to this:

def _get_content(self):
        if self.has_header('Content-Encoding'):
            return ''.join(self._container)
        return smart_str(''.join(map(str, self._container)), self._charset)

This would fix it as far as just forcing everything to a string. It worked for me when I tried it. However, in the interest of not modifying Django's libraries unnecessarily, I'll just be going through my code and fixing it there. It would be nice, though, if this were to make it into Django at some point. It seems unnecessary to have to call

HttpResponse(str(obj.pk))

Lastly, it seems likely that this appeared in Django 1.3. I upgraded not too long ago, and had no problems up until that point. This is the first time noticing this problem.

comment:7 Changed 3 years ago by PaulM

  • Owner changed from nobody to PaulM
  • Status changed from reopened to new

@anonymous I looked through the changelog and didn't see any recent changes in django/http/__init__.py that would have affected this. There may have been some elsewhere.

Your post points out that we have this bug in the _container property as well. I'll work on a patch here - I don't see any good reason not to convert to a string when setting HttpResponse._container, since we go on to use it as a string.

comment:8 Changed 3 years ago by PaulM

I put together an initial draft of a patch, which adds tests and raises an exception on non-string, non-iterator input:
https://github.com/PaulMcMillan/django/commit/75052eb95d4648e63def203eb1e2a42e1b261a85

After further consideration, I noticed that when the HttpResponse.content is accessed via iterator, it DOES convert to a string:
https://code.djangoproject.com/browser/django/trunk/django/http/__init__.py#L664

Given this, the most consistent fix (in line with existing behavior) is to convert to string on output. I'm working on a comprehensive patch to both clean up the code and do this in a way that is backwards compatible.

Changed 3 years ago by PaulM

Converts non-string inputs to strings just before output. Adds tests.

comment:9 Changed 3 years ago by PaulM

  • Has patch set

I've added a patch that converts everything to strings just before output. It cleans up the logic and fixes the original edge-case logic error. It also adds tests since this component was relatively under-tested.

comment:10 Changed 3 years ago by gabrielhurley

I'm not overwhelmingly familiar with this chunk of the codebase, but from a cursory glance it appears that the provided patch offers a more consistent approach to handling the content of HttpResponse. The tests and docs look reasonable, but I'd like another set of eyes on the code itself prior to marking it RFC.

comment:11 Changed 3 years ago by akaariai

  • Patch needs improvement set

I haven't actually tested the patch, but according to my reading of it, I think there is one big problem and one smaller problem. Both are present already in current code.

  • The big one: If the given content is iter(['a', 'b']), isn't it so that on first call of reponse.content you will get 'ab' and on second call . That is, the iter is consumed on first call and on the second one it is already consumed.
  • The smaller one: If the content is already a string, doesn't the .join(self._container) create a new copy of the string? That seems to be non-necessary. The content can be large and there is no point of copying it.

Try the following (should "fail" on both old and new version of get_content).

from django.core.management import setup_environ
import settings
setup_environ(settings)
from django.http import HttpResponse
h = HttpResponse(iter(['a', 'b']))
print h.content
print h.content

I think the best approach is to turn the given iterable to a string in set_content, store it in self._raw_content. Turn it into correctly encoded string when content is asked. Test and make sure there will be as little copying of the content as possible, including the conversion to the correct encoding. UTF-8 should be the expected result, so maybe ._raw_content could be in UTF-8?

comment:12 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added
  • Patch needs improvement unset

Forget about the string copying. It just doesn't matter performance wise, the _get_content isn't called when generating pages.

If I understand this correctly, which I hope is now the case, the normal (as in when running under Apache) path just iterates the response object, so that is the case which should be optimized, not the access to .content.

The multiple consuming of the iterator would be nice to fix, though. But even that should not be a blocker for this ticket. The problem exists already, and can be fixed separately. So, I removed the patch needs improvement flag.

comment:13 Changed 3 years ago by PaulM

As you say, the normal case is to iterate the HttpResponse object. This is why it is inappropriate to turn it into a string in _set_content(). Accessing content directly is only the correct thing to do if you're consuming the content once and for all, or the original content is not an iterator. #7581 is the issue you describe as multiple consuming the iterator and has been outstanding for a long time. This patch fixes a different issue (though I'm sure the patch for that ticket will need to be updated once we commit this).

comment:14 Changed 3 years ago by PaulM

  • Resolution set to fixed
  • Status changed from new to closed

In [16829]:

Fixed #16494 by normalizing HttpResponse behavior with non-string input. HttpResponse now always converts content to string on output, regardless of input type.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.