Opened 17 years ago
Closed 11 years ago
#10190 closed New feature (fixed)
Charset should be customizable with HttpResponse
| Reported by: | Wonlay | Owned by: | |
|---|---|---|---|
| Component: | HTTP handling | Version: | dev |
| Severity: | Normal | Keywords: | HttpResponse, charset, runtime |
| Cc: | unai@… | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
When our web site is talking to other application or client, the charset of web response may vary from each other.
Currently only a settings.DEFAULT_CHARSET can be set with a unique value, we cannot change the HttpResponse charset at runtime properly.
I think we should add a charset parameter to HttpResponse.init.
Attachments (2)
Change History (34)
by , 17 years ago
| Attachment: | http_response_charset.diff added |
|---|
follow-up: 4 comment:1 by , 17 years ago
| Patch needs improvement: | set |
|---|
comment:2 by , 17 years ago
| milestone: | → 1.1 |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
comment:3 by , 17 years ago
| milestone: | 1.1 → 1.2 |
|---|
comment:4 by , 16 years ago
| Owner: | changed from to |
|---|
- Find out whether Python codec names are all valid as HTTP charset names. If so, parse the
content_typeto see if a special charset encoding is needed (no need for an extra__init__parameter).
This is not the case. http://docs.python.org/library/codecs.html#standard-encodings ("A number of codecs are specific to Python, so their codec names have no meaning outside Python.") and http://www.cs.tut.fi/~jkorpela/chars/sorted.html which lacks a bunch of codecs listed in the first link.
- If the previous item is not applicable, don't use
_charsetin thecontent_typesetting, since it could lead to invalid value (and we'll need a documentation update to clarify the requirements there).
Such as a list of valid codecs?
by , 16 years ago
| Attachment: | charset_without_mimetype.diff added |
|---|
with changes suggested by mtreddinick (first draft)
comment:5 by , 16 years ago
http://www.iana.org/assignments/character-sets appears to be the canonical list of valid character sets.
comment:6 by , 16 years ago
Support for Accepted-Charset in an HttpRequest, charset=.. in the content_type, and manually setting the encoding all have implementations on http://code.djangoproject.com/browser/django/branches/soc2009/http-wsgi-improvements
As far as I understand, this code will not appear on the trunk until the end of Google Summer of Code, to be merged with the rest of the work on that branch. However, if someone wants a patch for trunk, ask, and I will cook one up.
comment:7 by , 16 years ago
(In [11199]) [soc2009/http-wsgi-improvements] Change HttpResponse.status_code to a property, additional test coverage. Refs #10190.
Improve charset coverage. Change HttpResponse.status_code to property, which checks for a 406 situation. This also required changes to all HttpResponse subclasses, so that their default status_code is set by _status_code, now.
Passes regression test suite, except ones related to sendfile.
comment:8 by , 16 years ago
| Patch needs improvement: | unset |
|---|---|
| Triage Stage: | Accepted → Fixed on a branch |
comment:10 by , 16 years ago
comment:11 by , 16 years ago
comment:12 by , 16 years ago
I may be way off base, but could this be the cause of a problem I'm having uploading an HTML block to an S3 backend using django-storages? The file 'somehow' gets encoded as UTF-32LE which seems to confuse Safari4. People will note that Safari renders this URL as text and Firefox understands it's HTML:
Here is the code from my views.py file:
file_data = {u'content':SimpleUploadedFile(filename + '.html', content_file)}
form = PageDisplayForm(post,file_data)
(I've tried doing SimpleUploadedFile(filename,content_file,'text/html') and that didn't help
Where content_file is a block of HTML. Here's the relevant chunk from settings.py:
from S3 import CallingFormat
DEFAULT_FILE_STORAGE = 'backends.s3.S3Storage'
AWS_CALLING_FORMAT = CallingFormat.PATH
comment:13 by , 16 years ago
comment:14 by , 16 years ago
| milestone: | 1.2 |
|---|
1.2 is feature-frozen, moving this feature request off the milestone.
comment:15 by , 15 years ago
#13391 was closed as duplicate of this and included a patch with an implementation for this issue.
comment:16 by , 15 years ago
| Severity: | → Normal |
|---|---|
| Type: | → New feature |
comment:19 by , 13 years ago
| Owner: | changed from to |
|---|
comment:20 by , 12 years ago
| Status: | new → assigned |
|---|
comment:21 by , 12 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:22 by , 12 years ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:23 by , 12 years ago
| Triage Stage: | Fixed on a branch → Accepted |
|---|
PR sent: https://github.com/django/django/pull/1928
HttpResponses charset is now depending on its Content-Type header key.
If no charset is found there, it falls back to the DEFAULT_CHARSET setting.
comment:24 by , 12 years ago
| Needs documentation: | set |
|---|---|
| Patch needs improvement: | set |
This latest patch doesn't address the fact that Python charsets do not match IANA-registered character sets. What's your take on this issue?
comment:25 by , 12 years ago
| Needs documentation: | unset |
|---|---|
| Patch needs improvement: | unset |
For the sake of future readers, this is what we have spoken on IRC (#django-dev):
There are two possible workarounds to this:
- Creating a dictionary between IANA registered charsets and Python charsets. This would be great but it means having to regularly update and maintain it.
- Adding a
charsetinitialization parameter and property toHttpResponse. If you have issues with the charset specification, just pass on the charset argument or set the charset property. If not set, it'll fall back to the extraction from theContent-Typeresponse header and if that isn't possible, to theDEFAULT_CHARSETsetting.
Finally, we decided for the second solution.
We also spoke about parsing the Accept-Charset request header so that the chain would be:
[charset parameter/property] ---> [content-type extraction] ---> [accept-charset extraction] ---> [default setting]
The problem with it was that the HttpResponse object could not possibly know what the request headers were. It's sad to have this limitation but there isn't any easy workaround.
So, apart of making the changes suggested in the review, I only added the charset parameter and property.
comment:26 by , 12 years ago
| Owner: | changed from to |
|---|
Tentatively assigning the ticket to myself to review the latest iteration of the patch.
comment:28 by , 12 years ago
| Cc: | added |
|---|
comment:29 by , 11 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:30 by , 11 years ago
| Patch needs improvement: | set |
|---|
I left comments for improvement on the PR. Please uncheck "Patch needs improvement" when you update it, thanks.
comment:32 by , 11 years ago
| Owner: | set to |
|---|---|
| Resolution: | → fixed |
| Status: | new → closed |
This patch should probably be just a one-liner: just setting
self._charset. Don't mess around with themimetypestuff, since that's not related. However, there's something more that should be worked out here to avoid all problems.If the value of
self._charsetis always going to be valid in the HTTP content-type header, we should just extract it from thecontent_typeparameter. That would only be valid if all Python codec values are also valid HTTP header values.If that isn't possible, then
self._charsetshouldn't be used to construct the content-type header, as it will use an invalid value. The current "default" is only for really old code (and/or people using defaults likes utf-8). For anything non-standard,content_typeshould be being used with a proper character set definition.So, improvements needed here are:
mimetypesetting. Make the smallest change possible and that isn't needed.content_typeto see if a special charset encoding is needed (no need for an extra__init__parameter)._charsetin thecontent_typesetting, since it could lead to invalid value (and we'll need a documentation update to clarify the requirements there).