Django

Code

Ticket #5956 (closed: fixed)

Opened 6 months ago

Last modified 5 months ago

unicode character in response headers

Reported by: Can Burak Cilingir <canburak@cs.bilgi.edu.tr> Assigned to: jvloothuis
Component: HTTP handling Version: SVN
Keywords: Cc:
Triage Stage: Ready for checkin Has patch: 1
Needs documentation: 0 Needs tests: 0
Patch needs improvement: 1

Description

when you put a non-7-bit character to a header, django dies in a bad way.

when you have DEBUG=True, you even don't see a django exception.

Traceback (most recent call last):

  File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 278, in run
    self.result = application(self.environ, self.start_response)

  File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 620, in __call__
    return self.application(environ, start_response)

  File "/usr/lib/python2.5/site-packages/django/core/handlers/wsgi.py", line 218, in __call__
    response_headers = [(str(k), str(v)) for k, v in response.items()]

UnicodeEncodeError: 'ascii' codec can't encode character u'\u011f' in position 9: ordinal not in range(128)

we need to display a better error desription or just ignore unicode objects for the headers

Attachments

unicode_http_headers.diff (2.0 kB) - added by shanx on 12/01/07 11:39:15.
Patch for unicode support for response headers
unicode_http_headers.2.diff (1.8 kB) - added by jvloothuis on 12/01/07 15:20:27.
better_header_encoding_error.diff (2.3 kB) - added by jvloothuis on 12/15/07 06:23:19.

Change History

11/15/07 15:10:01 changed by Can Burak Cilingir <canburak@cs.bilgi.edu.tr>

  • needs_better_patch changed.
  • component changed from Uncategorized to HTTP handling.
  • needs_tests changed.
  • needs_docs changed.

12/01/07 10:58:33 changed by shanx

  • owner changed from nobody to shanx.
  • status changed from new to assigned.
  • stage changed from Unreviewed to Accepted.

12/01/07 11:39:15 changed by shanx

  • attachment unicode_http_headers.diff added.

Patch for unicode support for response headers

12/01/07 11:42:01 changed by shanx

  • has_patch set to 1.
  • stage changed from Accepted to Ready for checkin.

Fixed the problem by converting both keys and values to str objects based on the charset of the HTTPResponse. The patch to this issue includes both a test and the patches to HTTPResponse to make this work.

12/01/07 14:10:13 changed by mtredinnick

  • needs_better_patch set to 1.
  • stage changed from Ready for checkin to Accepted.

Probably better (for consistency) to call smart_str(value, self._encoding) instead of writing a new function here. That's what smart_str() is for.

12/01/07 15:20:27 changed by jvloothuis

  • attachment unicode_http_headers.2.diff added.

12/01/07 15:22:02 changed by jvloothuis

  • owner changed from shanx to jvloothuis.
  • status changed from assigned to new.
  • summary changed from unicode character in response headars to unicode character in response headers.
  • stage changed from Accepted to Ready for checkin.

Using smart_str is better indeed. A response object does not have the _encoding. That is why the _charset is used (which boils down to the same thing).

12/10/07 19:49:27 changed by mtredinnick

  • stage changed from Ready for checkin to Accepted.

This ticket was originally asking for better error handling, but the attached patches actually try to encode the headers. I'm more in favour of better error handling, but if somebody is trying to encode things, the current code doesn't work. We MUST comply with sections 4.2 and 2.2 of RFC 2616 (the HTTP/1.1 spec) with respect to encoding. You can't just arbitrarily encode things using the character set given in the HttpResponse, since that's an encoding for the content, whereas headers need to be understood and manipulated by encoding-unaware machinery. So somebody needs to read the appropriate parts of RFC 2616 very carefully (and possibly portions of RFC 2047 as well) and implement the handling required by the specs here if you want to try and handle all data (without a huge performance impact on the common "things that are strings" case).

Or just raise better errors and ask the user to encode the headers appropriately and insert a string type, which is quite reasonable, since non-ASCII headers are very rare (cookies aside).

12/11/07 06:11:27 changed by jvloothuis

After reading the RFC's I agree with you in that my solution is the wrong one. From what I have read all headers should be printable ASCII (including cookies). I will create a new patch which will at least verify that all headers are ASCII (like you suggested).

12/15/07 06:23:19 changed by jvloothuis

  • attachment better_header_encoding_error.diff added.

12/15/07 06:25:31 changed by jvloothuis

  • stage changed from Accepted to Ready for checkin.

The new patch (better_header_encoding_error.diff) tells the developer about the reasons for the unicode error while still keeping the traceback.

12/17/07 01:58:38 changed by mtredinnick

I suspect we're still not implementing RFC 2616 correctly, but we can fix that in another ticket. For the record, though: section 4.2 says header values are text, tokens or quoted strings. Section 2.2 says text (and hence, quoted-strings) can contain octets, with some low-value exclusions, which means 8-bit values outside the ASCII range are permitted. Not sure of any case where this is used in practice, but it appears to be permissible.

I'm going to apply this patch and close this ticket, though, since it only reinforces our current constraints, so if we'll be no more wrong after this patch is applied, but the errors will be clearer.

12/17/07 02:02:35 changed by mtredinnick

I've opened #6219 for the problem in the last comment.

12/17/07 02:05:52 changed by mtredinnick

  • status changed from new to closed.
  • resolution set to fixed.

(In [6927]) Fixed #5956 -- Added a better error description for non-ASCII HTTP headers. Patch from jvloothuis.


Add/Change #5956 (unicode character in response headers)




Change Properties
Action