Opened 15 years ago
Closed 14 years ago
#13391 closed (duplicate)
Detect charset from Content-type header in the HttpResponse
Reported by: | lucky | Owned by: | nobody |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Keywords: | HttpResponse | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I was surprised today to realize that Django must be configured to instantiate django.http.HttpResponse object.
Python 2.6.4 (r264:75706, Dec 7 2009, 18:43:55) [GCC 4.4.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from django.http import HttpResponse >>> response = HttpResponse("Hello, World!", content_type="text/plain; charset=iso-8859-1") Traceback (most recent call last): ... ImportError: Settings cannot be imported, because environment variable DJANGO_SETTINGS_MODULE is undefined.
This happens because HttpResponse.__init__
reads settings.DEFAULT_CHARSET
to fill the property self._charset
. It is assumed charset of the response content. This is always taken from the settings even if content_type
has passed as parameter and contains the charset
.
In addition, this behavior breaks the django.test.TestCase.assertContains() when encoding of response content differs from the settings.DEFAULT_CHARSET
.
This patch makes HttpResponse object to fill self._charset
property with value from content_type argument if given.
Attachments (1)
Change History (5)
by , 15 years ago
Attachment: | 13007-http-response-charset.diff added |
---|
comment:1 by , 15 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 15 years ago
Note properly setting charset in the response is also the subject of #10190. Looks like there were a number of commits for this ticket to the http-wsgi-improvements branch.
comment:3 by , 15 years ago
I tried to make the bug fix patch does not impose new responsibilities on the HttpResponse object.
It just makes it possible to remove Django's uncertainty regarding to the _charset
in the most natural way for application - by providing charset
in the content_type
argument or in the Content-type
header of the http response
. I believe it is not violate current behavior and spirit of the current implementation of the HttpResponse.
In an ideal world
I personally believe that HttpResponse should work at the level of the 'headers' and the 'content' only. I was surprised that object has a own ._charset
property at all. Charset is an property of the content (and only of the special case of it "text content"). Therefore, the functions to work with content's charset in the HttpResponse are redundant. So I do not agree with current direction of HttpResponse implementation where content-specific operations are implemented in it, and #10190 too.
The guessing of the charset of the text content in the response, by analyzing of the information from HttpRequest, or by from the projects settings (settings.DEFAULT_CHARSET), or by ... are magic tasks for higher level. That is some sort of the Middleware. Not for HttpResponse itself.
In the case of Missing Charset it is better to move the charset guessing logic (and other content-specific operations) out of HttpResponse, to the application/client-level:
def guess_text_content_from_response(response, default_charset=settings.DEFAULT_CHARSET): """ :rtype: string representation of the response content with valid encoding. """ if response.has_key['Content-type'] ... if getattr(response.content, 'encoding', None): ... # is content a file? if any other magic ... return smart_str(response.content, guessed_charset) def convert_response_to_be_accepted_for_request(response, request): """Converts response content with respect to ACCEPT_CHARSET header of the ``request`` and adjust Content-type header. if request.META.has_key("ACCEPT_CHARSET"):.. ...
Properties of the content are describes in the Content-type header. The Content-Encoding header is about "transfer" encoding (it is not about content charset). There are independent.
comment:4 by , 14 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
I'm going to close this as a duplicate of #10190 and add a link to it there.
I think there might be another edge case here. Consider the case where:
In this case, Django will encode content using the DEFAULT_CHARSET, but won't provide any hints to the receiver on the charset they are receiving.