Code

Opened 7 years ago

Closed 7 years ago

#5956 closed (fixed)

unicode character in response headers

Reported by: Can Burak Cilingir <canburak@…> Owned by: jvloothuis
Component: HTTP handling Version: master
Severity: Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: UI/UX:

Description

when you put a non-7-bit character to a header, django dies in a bad way.

when you have DEBUG=True, you even don't see a django exception.

Traceback (most recent call last):

  File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 278, in run
    self.result = application(self.environ, self.start_response)

  File "/usr/lib/python2.5/site-packages/django/core/servers/basehttp.py", line 620, in __call__
    return self.application(environ, start_response)

  File "/usr/lib/python2.5/site-packages/django/core/handlers/wsgi.py", line 218, in __call__
    response_headers = [(str(k), str(v)) for k, v in response.items()]

UnicodeEncodeError: 'ascii' codec can't encode character u'\u011f' in position 9: ordinal not in range(128)

we need to display a better error desription or just ignore unicode objects for the headers

Attachments (3)

unicode_http_headers.diff (2.0 KB) - added by shanx 7 years ago.
Patch for unicode support for response headers
unicode_http_headers.2.diff (1.8 KB) - added by jvloothuis 7 years ago.
better_header_encoding_error.diff (2.3 KB) - added by jvloothuis 7 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 7 years ago by Can Burak Cilingir <canburak@…>

  • Component changed from Uncategorized to HTTP handling
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 7 years ago by shanx

  • Owner changed from nobody to shanx
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Accepted

Changed 7 years ago by shanx

Patch for unicode support for response headers

comment:3 Changed 7 years ago by shanx

  • Has patch set
  • Triage Stage changed from Accepted to Ready for checkin

Fixed the problem by converting both keys and values to str objects based on the charset of the HTTPResponse. The patch to this issue includes both a test and the patches to HTTPResponse to make this work.

comment:4 Changed 7 years ago by mtredinnick

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

Probably better (for consistency) to call smart_str(value, self._encoding) instead of writing a new function here. That's what smart_str() is for.

Changed 7 years ago by jvloothuis

comment:5 Changed 7 years ago by jvloothuis

  • Owner changed from shanx to jvloothuis
  • Status changed from assigned to new
  • Summary changed from unicode character in response headars to unicode character in response headers
  • Triage Stage changed from Accepted to Ready for checkin

Using smart_str is better indeed. A response object does not have the _encoding. That is why the _charset is used (which boils down to the same thing).

comment:6 Changed 7 years ago by mtredinnick

  • Triage Stage changed from Ready for checkin to Accepted

This ticket was originally asking for better error handling, but the attached patches actually try to encode the headers. I'm more in favour of better error handling, but if somebody is trying to encode things, the current code doesn't work. We MUST comply with sections 4.2 and 2.2 of RFC 2616 (the HTTP/1.1 spec) with respect to encoding. You can't just arbitrarily encode things using the character set given in the HttpResponse, since that's an encoding for the content, whereas headers need to be understood and manipulated by encoding-unaware machinery. So somebody needs to read the appropriate parts of RFC 2616 very carefully (and possibly portions of RFC 2047 as well) and implement the handling required by the specs here if you want to try and handle all data (without a huge performance impact on the common "things that are strings" case).

Or just raise better errors and ask the user to encode the headers appropriately and insert a string type, which is quite reasonable, since non-ASCII headers are very rare (cookies aside).

comment:7 Changed 7 years ago by jvloothuis

After reading the RFC's I agree with you in that my solution is the wrong one. From what I have read all headers should be printable ASCII (including cookies). I will create a new patch which will at least verify that all headers are ASCII (like you suggested).

Changed 7 years ago by jvloothuis

comment:8 Changed 7 years ago by jvloothuis

  • Triage Stage changed from Accepted to Ready for checkin

The new patch (better_header_encoding_error.diff) tells the developer about the reasons for the unicode error while still keeping the traceback.

comment:9 Changed 7 years ago by mtredinnick

I suspect we're still not implementing RFC 2616 correctly, but we can fix that in another ticket. For the record, though: section 4.2 says header values are text, tokens or quoted strings. Section 2.2 says text (and hence, quoted-strings) can contain octets, with some low-value exclusions, which means 8-bit values outside the ASCII range are permitted. Not sure of any case where this is used in practice, but it appears to be permissible.

I'm going to apply this patch and close this ticket, though, since it only reinforces our current constraints, so if we'll be no more wrong after this patch is applied, but the errors will be clearer.

comment:10 Changed 7 years ago by mtredinnick

I've opened #6219 for the problem in the last comment.

comment:11 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from new to closed

(In [6927]) Fixed #5956 -- Added a better error description for non-ASCII HTTP headers. Patch from jvloothuis.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.