Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#20889 closed Bug (fixed)

HttpResponseBase._convert_to_charset complains about newline it inserted itself

Reported by: mjl@… Owned by: Rik
Component: HTTP handling Version: dev
Severity: Normal Keywords: nlsprint14
Cc: poly, gitaarik@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following snippet:

# coding=utf-8
from django.http import HttpResponse
h = HttpResponse()
f = 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz a\xcc\x88'.decode('utf-8')
h['Content-Disposition'] = u'attachment; filename="%s"' % f

This contains a "OSX-Style" Umlaut (a + ¨), which triggers the "convert to mime encoding" path of _convert_to_charset. However, this string is so long that email.header.Header, which is used to mime-encode, will split the line. Later _convert_to_charset explodes because the header now contains a newline.

$ ./manage.py shell
Python 2.7.5 (default, Aug  1 2013, 01:01:17)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import ttt
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/mjl/Projects/cms_sites/cc.intranet/cms/ttt.py", line 5, in <module>
    h['Content-Disposition'] = u'attachment; filename="%s"' % f
  File "/Users/mjl/Projects/cms_sites/cc.intranet/lib/python2.7/site-packages/django/http/response.py", line 110, in __setitem__
    value = self._convert_to_charset(value, 'latin1', mime_encode=True)
  File "/Users/mjl/Projects/cms_sites/cc.intranet/lib/python2.7/site-packages/django/http/response.py", line 105, in _convert_to_charset
    raise BadHeaderError("Header values can't contain newlines (got %r)" % value)
BadHeaderError: Header values can't contain newlines (got '=?utf-8?q?attachment=3B_filename=3D=22zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz_a?=\n =?utf-8?b?zIgi?=')

Solution: django/http/response.py#L163 should probably use

Header(value, 'utf-8', maxlinelen=1000)

to avoid generating continuation lines. Agreed, the "1000" is somewhat arbitrary, but realistically this fixes the problem.

Change History (17)

comment:1 by Daniel Boeve, 11 years ago

Owner: changed from nobody to Daniel Boeve
Status: newassigned

comment:2 by Tim Graham <timograham@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 6dca603abb0eb164ba87657caf5cc65bca449719:

Fixed #20889 -- Prevented email.Header from inserting newlines

Passed large maxlinelen to email.Header to prevent newlines from being
inserted into value returned by _convert_to_charset

Thanks mjl at laubach.at for the report.

comment:3 by Tim Graham <timograham@…>, 11 years ago

In 789d8f07487f80c5b90ec9cbb1fcf99837ef3ace:

Fixed syntax error on Python 3.2; refs #20889.

comment:4 by poly, 11 years ago

Cc: poly added
Resolution: fixed
Status: closednew

I'm afraid this annoying bug hasn't gone, see my report on #21620.

comment:5 by Rik, 11 years ago

Owner: changed from Daniel Boeve to Rik
Status: newassigned

comment:6 by Rik, 11 years ago

Keywords: nlsprint14 added

comment:7 by Rik, 11 years ago

The bug you are encountering poly, is (I think) because of a bug in the Python standard library that has never been solved.

The bug was reported over here:
http://bugs.python.org/issue12649

However, it was closed for some reason. I can still reproduce the bug with this snippet:

from email.header import Header
import sys


print Header(
    u'attachment; filename="ajdsklfj klasdjfkl asdjfkl jadsfja sdflkads fad fads adsf dasjfkl jadslkfj dlasf asd \u6211\u6211\u6211 jo \u6211\u6211 jo \u6211\u6211"',
    charset='utf-8',
    maxlinelen=sys.maxsize
).encode()

which output is:

=?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9ImFqZHNrbGZqIGtsYXNkamZrbCBhc2RqZmts?=
 =?utf-8?b?IGphZHNmamEgc2RmbGthZHMgZmFkIGZhZHMgYWRzZiBkYXNqZmtsIGphZHNsa2Zq?=
 =?utf-8?b?IGRsYXNmIGFzZCDmiJHmiJHmiJEgam8g5oiR5oiRIGpvIOaIkeaIkSI=?=

So it's because there's one character in the header that can't be encoded to latin-1 which raises a UnicodeError, which will be catched and will be encoded using the Header class from the python standard library in email.header.

So this happens over here:
https://github.com/django/django/blob/master/django/http/response.py#L169

Well, it correctly gives the maxlinelen. Now, what goes wrong in there?

If you dive in to the code, I tackled it down to the _encode_chunks the Header class. This method takes the maxlinelen argument (which is ultimately based on the value initially provided in the init method in the code I posted above). However, it doesn't give it to the header_encode method that is used in this piece of code in this _encode_chunks method:

  if charset is None or charset.header_encoding is None:
      s = header
  else:
      s = charset.header_encode(header)

So if charset.header_encoding is set it uses this method to encode the header, but then the maxlinelen is lost. This is where it goes wrong.

I'll keep this ticket assigned to me as I'm probably going on with it tomorrow.

Version 0, edited 11 years ago by Rik (next)

comment:8 by Rik, 11 years ago

I just created a ticket in the Python bugtracker about this:
http://bugs.python.org/issue20747

comment:9 by Rik, 11 years ago

Easy pickings: unset

comment:10 by mjl, 11 years ago

Actually, I'm not sure that the check for newline is actually correct. There is no problem with a newline in a header as long as it's followed by white space, which means it's a continuation line and will be folded back at the other end.

I understand where that check comes from though, there have been too many header injections out there.

comment:11 by Tim Graham, 11 years ago

Has patch: unset
Type: UncategorizedBug
Version: 1.51.7-beta-2

Is there any harm in leaving the original fix in 1.7? Should we leave this ticket open or is it something that should by solved in Python itself?

comment:12 by Tim Graham, 10 years ago

Resolution: needsinfo
Status: assignedclosed

Closing as needsinfo as it's not clear to me what we can do in Django to solve this issue and I haven't been able to get anyway to follow up in 3 months.

comment:13 by Rik, 10 years ago

Cc: gitaarik@… added

comment:14 by Claude Paroz, 10 years ago

Has patch: set
Resolution: needsinfo
Status: closednew
Triage Stage: UnreviewedAccepted
Version: 1.7-beta-2master

I've encountered this bug on a production site. I'm suggesting to check for newlines in user-provided values, before formatting the header through Python API to workaround this issue.

https://github.com/django/django/pull/4163

comment:15 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

comment:16 by Claude Paroz <claude@…>, 10 years ago

Resolution: fixed
Status: newclosed

In efb1f99f943b99624c78b91ff58502125490191e:

Fixed #20889 -- Prevented BadHeaderError when Python inserts newline

Workaround for http://bugs.python.org/issue20747.
In some corner cases, Python 2 inserts a newline in a header value
despite maxlinelen passed in Header constructor.
Thanks Tim Graham for the review.

comment:17 by Claude Paroz <claude@…>, 10 years ago

In 84e7fec88ddfc4178500a80d640728226d77317a:

[1.8.x] Fixed #20889 -- Prevented BadHeaderError when Python inserts newline

Workaround for http://bugs.python.org/issue20747.
In some corner cases, Python 2 inserts a newline in a header value
despite maxlinelen passed in Header constructor.
Thanks Tim Graham for the review.
Backport of efb1f99f94 from master.

Note: See TracTickets for help on using tickets.
Back to Top