Opened 16 years ago
Closed 10 years ago
#10190 closed New feature (fixed)
Charset should be customizable with HttpResponse
Reported by: | Wonlay | Owned by: | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | HttpResponse, charset, runtime |
Cc: | unai@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When our web site is talking to other application or client, the charset of web response may vary from each other.
Currently only a settings.DEFAULT_CHARSET
can be set with a unique value, we cannot change the HttpResponse charset at runtime properly.
I think we should add a charset parameter to HttpResponse.init.
Attachments (2)
Change History (34)
by , 16 years ago
Attachment: | http_response_charset.diff added |
---|
follow-up: 4 comment:1 by , 16 years ago
Patch needs improvement: | set |
---|
comment:2 by , 16 years ago
milestone: | → 1.1 |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 16 years ago
milestone: | 1.1 → 1.2 |
---|
comment:4 by , 16 years ago
Owner: | changed from | to
---|
- Find out whether Python codec names are all valid as HTTP charset names. If so, parse the
content_type
to see if a special charset encoding is needed (no need for an extra__init__
parameter).
This is not the case. http://docs.python.org/library/codecs.html#standard-encodings ("A number of codecs are specific to Python, so their codec names have no meaning outside Python.") and http://www.cs.tut.fi/~jkorpela/chars/sorted.html which lacks a bunch of codecs listed in the first link.
- If the previous item is not applicable, don't use
_charset
in thecontent_type
setting, since it could lead to invalid value (and we'll need a documentation update to clarify the requirements there).
Such as a list of valid codecs?
by , 16 years ago
Attachment: | charset_without_mimetype.diff added |
---|
with changes suggested by mtreddinick (first draft)
comment:5 by , 16 years ago
http://www.iana.org/assignments/character-sets appears to be the canonical list of valid character sets.
comment:6 by , 16 years ago
Support for Accepted-Charset in an HttpRequest, charset=.. in the content_type, and manually setting the encoding all have implementations on http://code.djangoproject.com/browser/django/branches/soc2009/http-wsgi-improvements
As far as I understand, this code will not appear on the trunk until the end of Google Summer of Code, to be merged with the rest of the work on that branch. However, if someone wants a patch for trunk, ask, and I will cook one up.
comment:7 by , 15 years ago
(In [11199]) [soc2009/http-wsgi-improvements] Change HttpResponse.status_code to a property, additional test coverage. Refs #10190.
Improve charset coverage. Change HttpResponse.status_code to property, which checks for a 406 situation. This also required changes to all HttpResponse subclasses, so that their default status_code is set by _status_code, now.
Passes regression test suite, except ones related to sendfile.
comment:8 by , 15 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Fixed on a branch |
comment:10 by , 15 years ago
comment:11 by , 15 years ago
comment:12 by , 15 years ago
I may be way off base, but could this be the cause of a problem I'm having uploading an HTML block to an S3 backend using django-storages? The file 'somehow' gets encoded as UTF-32LE which seems to confuse Safari4. People will note that Safari renders this URL as text and Firefox understands it's HTML:
Here is the code from my views.py file:
file_data = {u'content':SimpleUploadedFile(filename + '.html', content_file)} form = PageDisplayForm(post,file_data)
(I've tried doing SimpleUploadedFile(filename,content_file,'text/html') and that didn't help
Where content_file is a block of HTML. Here's the relevant chunk from settings.py:
from S3 import CallingFormat DEFAULT_FILE_STORAGE = 'backends.s3.S3Storage' AWS_CALLING_FORMAT = CallingFormat.PATH
comment:13 by , 15 years ago
comment:14 by , 15 years ago
milestone: | 1.2 |
---|
1.2 is feature-frozen, moving this feature request off the milestone.
comment:15 by , 14 years ago
#13391 was closed as duplicate of this and included a patch with an implementation for this issue.
comment:16 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → New feature |
comment:19 by , 12 years ago
Owner: | changed from | to
---|
comment:20 by , 11 years ago
Status: | new → assigned |
---|
comment:21 by , 11 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:22 by , 11 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:23 by , 11 years ago
Triage Stage: | Fixed on a branch → Accepted |
---|
PR sent: https://github.com/django/django/pull/1928
HttpResponse
s charset is now depending on its Content-Type
header key.
If no charset is found there, it falls back to the DEFAULT_CHARSET
setting.
comment:24 by , 11 years ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
This latest patch doesn't address the fact that Python charsets do not match IANA-registered character sets. What's your take on this issue?
comment:25 by , 11 years ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
For the sake of future readers, this is what we have spoken on IRC (#django-dev
):
There are two possible workarounds to this:
- Creating a dictionary between IANA registered charsets and Python charsets. This would be great but it means having to regularly update and maintain it.
- Adding a
charset
initialization parameter and property toHttpResponse
. If you have issues with the charset specification, just pass on the charset argument or set the charset property. If not set, it'll fall back to the extraction from theContent-Type
response header and if that isn't possible, to theDEFAULT_CHARSET
setting.
Finally, we decided for the second solution.
We also spoke about parsing the Accept-Charset
request header so that the chain would be:
[charset parameter/property] ---> [content-type extraction] ---> [accept-charset extraction] ---> [default setting]
The problem with it was that the HttpResponse
object could not possibly know what the request headers were. It's sad to have this limitation but there isn't any easy workaround.
So, apart of making the changes suggested in the review, I only added the charset parameter and property.
comment:26 by , 11 years ago
Owner: | changed from | to
---|
Tentatively assigning the ticket to myself to review the latest iteration of the patch.
comment:28 by , 11 years ago
Cc: | added |
---|
comment:29 by , 11 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:30 by , 10 years ago
Patch needs improvement: | set |
---|
I left comments for improvement on the PR. Please uncheck "Patch needs improvement" when you update it, thanks.
comment:32 by , 10 years ago
Owner: | set to |
---|---|
Resolution: | → fixed |
Status: | new → closed |
This patch should probably be just a one-liner: just setting
self._charset
. Don't mess around with themimetype
stuff, since that's not related. However, there's something more that should be worked out here to avoid all problems.If the value of
self._charset
is always going to be valid in the HTTP content-type header, we should just extract it from thecontent_type
parameter. That would only be valid if all Python codec values are also valid HTTP header values.If that isn't possible, then
self._charset
shouldn't be used to construct the content-type header, as it will use an invalid value. The current "default" is only for really old code (and/or people using defaults likes utf-8). For anything non-standard,content_type
should be being used with a proper character set definition.So, improvements needed here are:
mimetype
setting. Make the smallest change possible and that isn't needed.content_type
to see if a special charset encoding is needed (no need for an extra__init__
parameter)._charset
in thecontent_type
setting, since it could lead to invalid value (and we'll need a documentation update to clarify the requirements there).