Opened 16 years ago

Closed 16 years ago

#5956 closed (fixed)

unicode character in response headers

Reported by: Can Burak Cilingir <canburak@…> Owned by: Jeroen Vloothuis
Component: HTTP handling Version: dev
Severity: Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

when you put a non-7-bit character to a header, django dies in a bad way.

when you have DEBUG=True, you even don't see a django exception.

Traceback (most recent call last):

  File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 278, in run
    self.result = application(self.environ, self.start_response)

  File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 620, in __call__
    return self.application(environ, start_response)

  File "/usr/lib/python2.5/site-packages/django/core/handlers/wsgi.py", line 218, in __call__
    response_headers = [(str(k), str(v)) for k, v in response.items()]

UnicodeEncodeError: 'ascii' codec can't encode character u'\u011f' in position 9: ordinal not in range(128)

we need to display a better error desription or just ignore unicode objects for the headers

Attachments (3)

unicode_http_headers.diff (2.0 KB ) - added by Remco Wendt 16 years ago.
Patch for unicode support for response headers
unicode_http_headers.2.diff (1.8 KB ) - added by Jeroen Vloothuis 16 years ago.
better_header_encoding_error.diff (2.3 KB ) - added by Jeroen Vloothuis 16 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 by Can Burak Cilingir <canburak@…>, 16 years ago

Component: UncategorizedHTTP handling

comment:2 by Remco Wendt, 16 years ago

Owner: changed from nobody to Remco Wendt
Status: newassigned
Triage Stage: UnreviewedAccepted

by Remco Wendt, 16 years ago

Attachment: unicode_http_headers.diff added

Patch for unicode support for response headers

comment:3 by Remco Wendt, 16 years ago

Has patch: set
Triage Stage: AcceptedReady for checkin

Fixed the problem by converting both keys and values to str objects based on the charset of the HTTPResponse. The patch to this issue includes both a test and the patches to HTTPResponse to make this work.

comment:4 by Malcolm Tredinnick, 16 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Probably better (for consistency) to call smart_str(value, self._encoding) instead of writing a new function here. That's what smart_str() is for.

by Jeroen Vloothuis, 16 years ago

Attachment: unicode_http_headers.2.diff added

comment:5 by Jeroen Vloothuis, 16 years ago

Owner: changed from Remco Wendt to Jeroen Vloothuis
Status: assignednew
Summary: unicode character in response headarsunicode character in response headers
Triage Stage: AcceptedReady for checkin

Using smart_str is better indeed. A response object does not have the _encoding. That is why the _charset is used (which boils down to the same thing).

comment:6 by Malcolm Tredinnick, 16 years ago

Triage Stage: Ready for checkinAccepted

This ticket was originally asking for better error handling, but the attached patches actually try to encode the headers. I'm more in favour of better error handling, but if somebody is trying to encode things, the current code doesn't work. We MUST comply with sections 4.2 and 2.2 of RFC 2616 (the HTTP/1.1 spec) with respect to encoding. You can't just arbitrarily encode things using the character set given in the HttpResponse, since that's an encoding for the content, whereas headers need to be understood and manipulated by encoding-unaware machinery. So somebody needs to read the appropriate parts of RFC 2616 very carefully (and possibly portions of RFC 2047 as well) and implement the handling required by the specs here if you want to try and handle all data (without a huge performance impact on the common "things that are strings" case).

Or just raise better errors and ask the user to encode the headers appropriately and insert a string type, which is quite reasonable, since non-ASCII headers are very rare (cookies aside).

comment:7 by Jeroen Vloothuis, 16 years ago

After reading the RFC's I agree with you in that my solution is the wrong one. From what I have read all headers should be printable ASCII (including cookies). I will create a new patch which will at least verify that all headers are ASCII (like you suggested).

by Jeroen Vloothuis, 16 years ago

comment:8 by Jeroen Vloothuis, 16 years ago

Triage Stage: AcceptedReady for checkin

The new patch (better_header_encoding_error.diff) tells the developer about the reasons for the unicode error while still keeping the traceback.

comment:9 by Malcolm Tredinnick, 16 years ago

I suspect we're still not implementing RFC 2616 correctly, but we can fix that in another ticket. For the record, though: section 4.2 says header values are text, tokens or quoted strings. Section 2.2 says text (and hence, quoted-strings) can contain octets, with some low-value exclusions, which means 8-bit values outside the ASCII range are permitted. Not sure of any case where this is used in practice, but it appears to be permissible.

I'm going to apply this patch and close this ticket, though, since it only reinforces our current constraints, so if we'll be no more wrong after this patch is applied, but the errors will be clearer.

comment:10 by Malcolm Tredinnick, 16 years ago

I've opened #6219 for the problem in the last comment.

comment:11 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: newclosed

(In [6927]) Fixed #5956 -- Added a better error description for non-ASCII HTTP headers. Patch from jvloothuis.

Note: See TracTickets for help on using tickets.
Back to Top