Code

Opened 4 years ago

Closed 3 years ago

#13391 closed (duplicate)

Detect charset from Content-type header in the HttpResponse

Reported by: lucky Owned by: nobody
Component: HTTP handling Version: master
Severity: Keywords: HttpResponse
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

I was surprised today to realize that Django must be configured to instantiate django.http.HttpResponse object.

Python 2.6.4 (r264:75706, Dec  7 2009, 18:43:55)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from django.http import HttpResponse
>>> response = HttpResponse("Hello, World!", content_type="text/plain; charset=iso-8859-1")
Traceback (most recent call last):
   ...
ImportError: Settings cannot be imported, because environment variable DJANGO_SETTINGS_MODULE is undefined.

This happens because HttpResponse.__init__ reads settings.DEFAULT_CHARSET to fill the property self._charset. It is assumed charset of the response content. This is always taken from the settings even if content_type has passed as parameter and contains the charset.

In addition, this behavior breaks the django.test.TestCase.assertContains() when encoding of response content differs from the settings.DEFAULT_CHARSET.

This patch makes HttpResponse object to fill self._charset property with value from content_type argument if given.

Attachments (1)

13007-http-response-charset.diff (6.1 KB) - added by lucky 4 years ago.

Download all attachments as: .zip

Change History (5)

Changed 4 years ago by lucky

comment:1 Changed 4 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

I think there might be another edge case here. Consider the case where:

  • The DEFAULT_CHARSET isn't UTF-8 (I'm thinking something exotic like KOI8-r)
  • A content_type is manually specified, but *doesn't* contain a charset identifier
  • The Content-Encoding header hasn't been manually set.

In this case, Django will encode content using the DEFAULT_CHARSET, but won't provide any hints to the receiver on the charset they are receiving.

comment:2 Changed 4 years ago by kmtracey

Note properly setting charset in the response is also the subject of #10190. Looks like there were a number of commits for this ticket to the http-wsgi-improvements branch.

comment:3 Changed 4 years ago by lucky

I tried to make the bug fix patch does not impose new responsibilities on the HttpResponse object.
It just makes it possible to remove Django's uncertainty regarding to the _charset in the most natural way for application - by providing charset in the content_type argument or in the Content-type header of the http response. I believe it is not violate current behavior and spirit of the current implementation of the HttpResponse.

In an ideal world

I personally believe that HttpResponse should work at the level of the 'headers' and the 'content' only. I was surprised that object has a own ._charset property at all. Charset is an property of the content (and only of the special case of it "text content"). Therefore, the functions to work with content's charset in the HttpResponse are redundant. So I do not agree with current direction of HttpResponse implementation where content-specific operations are implemented in it, and #10190 too.

The guessing of the charset of the text content in the response, by analyzing of the information from HttpRequest, or by from the projects settings (settings.DEFAULT_CHARSET), or by ... are magic tasks for higher level. That is some sort of the Middleware. Not for HttpResponse itself.

In the case of Missing Charset it is better to move the charset guessing logic (and other content-specific operations) out of HttpResponse, to the application/client-level:

def guess_text_content_from_response(response, default_charset=settings.DEFAULT_CHARSET):
    """
    :rtype: string representation of the response content with valid encoding.
    """
    if response.has_key['Content-type'] ...
    if getattr(response.content, 'encoding', None): ... # is content a file?
    if any other magic ...
    return smart_str(response.content, guessed_charset)

def convert_response_to_be_accepted_for_request(response, request):
    """Converts response content with respect to ACCEPT_CHARSET header of the ``request`` and adjust Content-type header.
    if request.META.has_key("ACCEPT_CHARSET"):..
            ...

Properties of the content are describes in the Content-type header. The Content-Encoding header is about "transfer" encoding (it is not about content charset). There are independent.

comment:4 Changed 3 years ago by ramiro

  • Resolution set to duplicate
  • Status changed from new to closed

I'm going to close this as a duplicate of #10190 and add a link to it there.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.