Django incorrectly restricts HTTP header values to ASCII
|Reported by:||aaugustin||Owned by:||aaugustin|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Whenever a HTTP header is set for an HttpResponse, Django raises an exception if its key or value contains non-ASCII characters.
However, RFC2616 defines message headers in section 4.2 as:
message-header = field-name ":" [ field-value ] field-name = token field-value = *( field-content | LWS ) field-content = <the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string>
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1  only when encoded according to the rules of RFC 2047 . TEXT = <any OCTET except CTLs, but including LWS>
This indicates that an arbitrary bytestring is acceptable as a value.
I hit this issue while setting a X-SendFile header pointing to a non-ASCII file name. It seems to me that Django should:
- at least accept any bytes content (since any bytestring can be interpreted as latin-1) and attempt converting text content to latin-1, raising an error if that isn't possible
- even better, use MIME encoding for text values that don't fit in the latin-1 charset.
The header keys must stay restricted to ASCII: RFC 2616 says they're of the token type, defined by:
token = 1*<any CHAR except CTLs or separators>
CHAR = <any US-ASCII character (octets 0 - 127)>
Finally, PEP 3333 says:
Note also that strings passed to start_response() as a status or as response headers must follow RFC 2616 with respect to encoding. That is, they must either be ISO-8859-1 characters, or use RFC 2047 MIME encoding.
On Python platforms where the str or StringType type is in fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all "strings" referred to in this specification must contain only code points representable in ISO-8859-1 encoding (\u0000 through \u00FF, inclusive).
PS: RFC 2616 points to RFC 822, where section 3.1.2. restricts headers to ASCII. This may explain why Django has this restriction.
Change History (8)
Changed 18 months ago by aaugustin
comment:3 follow-up: ↓ 6 Changed 18 months ago by aaugustin
- Has patch set
- Patch needs improvement set