Opened 12 years ago
Closed 12 years ago
#18916 closed Bug (fixed)
Django incorrectly restricts HTTP header values to ASCII
Reported by: | Aymeric Augustin | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | chris@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Whenever a HTTP header is set for an HttpResponse
, Django raises an exception if its key or value contains non-ASCII characters.
However, RFC2616 defines message headers in section 4.2 as:
message-header = field-name ":" [ field-value ] field-name = token field-value = *( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>
where
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14]. TEXT = <any OCTET except CTLs, but including LWS>
This indicates that an arbitrary bytestring is acceptable as a value.
I hit this issue while setting a X-SendFile header pointing to a non-ASCII file name. It seems to me that Django should:
- at least accept any bytes content (since any bytestring can be interpreted as latin-1) and attempt converting text content to latin-1, raising an error if that isn't possible
- even better, use MIME encoding for text values that don't fit in the latin-1 charset.
The header keys must stay restricted to ASCII: RFC 2616 says they're of the token
type, defined by:
token = 1*<any CHAR except CTLs or separators>
with
CHAR = <any US-ASCII character (octets 0 - 127)>
Finally, PEP 3333 says:
Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.
On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all "strings" referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive).
PS: RFC 2616 points to RFC 822, where section 3.1.2. restricts headers to ASCII. This may explain why Django has this restriction.
Attachments (1)
Change History (8)
comment:1 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 12 years ago
Cc: | added |
---|
by , 12 years ago
Attachment: | 18916.diff added |
---|
follow-up: 6 comment:3 by , 12 years ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
Attached patch would work if it weren't for this line in core/mail/message.py
:
Charset.add_charset('utf-8', Charset.SHORTEST, None, 'utf-8')
comment:4 by , 12 years ago
Owner: | changed from | to
---|
comment:5 by , 12 years ago
Patch needs improvement: | unset |
---|
Updated patch, pull request here: https://github.com/django/django/pull/339
comment:6 by , 12 years ago
comment:7 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Your analysis seems correct. +1 from me.