Opened 2 years ago

Closed 5 months ago

Last modified 5 months ago

#20889 closed Bug (fixed)

HttpResponseBase._convert_to_charset complains about newline it inserted itself

Reported by: mjl@… Owned by: rednaw
Component: HTTP handling Version: master
Severity: Normal Keywords: nlsprint14
Cc: poly, gitaarik@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Consider the following snippet:

# coding=utf-8
from django.http import HttpResponse
h = HttpResponse()
f = 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz a\xcc\x88'.decode('utf-8')
h['Content-Disposition'] = u'attachment; filename="%s"' % f

This contains a "OSX-Style" Umlaut (a + ¨), which triggers the "convert to mime encoding" path of _convert_to_charset. However, this string is so long that email.header.Header, which is used to mime-encode, will split the line. Later _convert_to_charset explodes because the header now contains a newline.

$ ./manage.py shell
Python 2.7.5 (default, Aug  1 2013, 01:01:17)
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import ttt
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/mjl/Projects/cms_sites/cc.intranet/cms/ttt.py", line 5, in <module>
    h['Content-Disposition'] = u'attachment; filename="%s"' % f
  File "/Users/mjl/Projects/cms_sites/cc.intranet/lib/python2.7/site-packages/django/http/response.py", line 110, in __setitem__
    value = self._convert_to_charset(value, 'latin1', mime_encode=True)
  File "/Users/mjl/Projects/cms_sites/cc.intranet/lib/python2.7/site-packages/django/http/response.py", line 105, in _convert_to_charset
    raise BadHeaderError("Header values can't contain newlines (got %r)" % value)
BadHeaderError: Header values can't contain newlines (got '=?utf-8?q?attachment=3B_filename=3D=22zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz_a?=\n =?utf-8?b?zIgi?=')

Solution: django/http/response.py#L163 should probably use

Header(value, 'utf-8', maxlinelen=1000)

to avoid generating continuation lines. Agreed, the "1000" is somewhat arbitrary, but realistically this fixes the problem.

Change History (17)

comment:1 Changed 23 months ago by animan1

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to animan1
  • Patch needs improvement unset
  • Status changed from new to assigned

comment:2 Changed 23 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 6dca603abb0eb164ba87657caf5cc65bca449719:

Fixed #20889 -- Prevented email.Header from inserting newlines

Passed large maxlinelen to email.Header to prevent newlines from being
inserted into value returned by _convert_to_charset

Thanks mjl at laubach.at for the report.

comment:3 Changed 23 months ago by Tim Graham <timograham@…>

In 789d8f07487f80c5b90ec9cbb1fcf99837ef3ace:

Fixed syntax error on Python 3.2; refs #20889.

comment:4 Changed 20 months ago by poly

  • Cc poly added
  • Resolution fixed deleted
  • Status changed from closed to new

I'm afraid this annoying bug hasn't gone, see my report on #21620.

comment:5 Changed 17 months ago by rednaw

  • Owner changed from animan1 to rednaw
  • Status changed from new to assigned

comment:6 Changed 17 months ago by rednaw

  • Keywords nlsprint14 added

comment:7 Changed 17 months ago by rednaw

The bug you are encountering poly, is (I think) because of a bug in the Python standard library that has never been solved.

The bug was reported over here:
http://bugs.python.org/issue12649

However, it was closed for some reason. I can still reproduce the bug with this snippet:

from email.header import Header
import sys


print Header(
    u'attachment; filename="ajdsklfj klasdjfkl asdjfkl jadsfja sdflkads fad fads adsf dasjfkl jadslkfj dlasf asd \u6211\u6211\u6211 jo \u6211\u6211 jo \u6211\u6211"',
    charset='utf-8',
    maxlinelen=sys.maxsize
).encode()

which output is:

=?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9ImFqZHNrbGZqIGtsYXNkamZrbCBhc2RqZmts?=
 =?utf-8?b?IGphZHNmamEgc2RmbGthZHMgZmFkIGZhZHMgYWRzZiBkYXNqZmtsIGphZHNsa2Zq?=
 =?utf-8?b?IGRsYXNmIGFzZCDmiJHmiJHmiJEgam8g5oiR5oiRIGpvIOaIkeaIkSI=?=

So it's because there's one character in the header that can't be encoded to latin-1 which raises a UnicodeError, which will be catched and will be encoded using the Header class from the python standard library in email.header.

So this happens over here:
https://github.com/django/django/blob/master/django/http/response.py#L169

Well, it correctly gives the maxlinelen. Now, what goes wrong in there?

If you dive in to the code, I tackled it down to the _encode_chunks the Header class. This method takes the maxlinelen argument (which is ultimately based on the value initially provided in the init method in the code I posted above). However, it doesn't give it to the header_encode method that is used in this piece of code in this _encode_chunks method:

  if charset is None or charset.header_encoding is None:
      s = header
  else:
      s = charset.header_encode(header)

So if charset.header_encoding is set it uses this method to encode the header, but then the maxlinelen is lost. This is where it goes wrong.

I'll keep this ticket assigned to me as I'm probably going on with it tomorrow.

Version 0, edited 17 months ago by rednaw (next)

comment:8 Changed 17 months ago by rednaw

I just created a ticket in the Python bugtracker about this:
http://bugs.python.org/issue20747

comment:9 Changed 17 months ago by rednaw

  • Easy pickings unset

comment:10 Changed 16 months ago by mjl

Actually, I'm not sure that the check for newline is actually correct. There is no problem with a newline in a header as long as it's followed by white space, which means it's a continuation line and will be folded back at the other end.

I understand where that check comes from though, there have been too many header injections out there.

comment:11 Changed 15 months ago by timo

  • Has patch unset
  • Type changed from Uncategorized to Bug
  • Version changed from 1.5 to 1.7-beta-2

Is there any harm in leaving the original fix in 1.7? Should we leave this ticket open or is it something that should by solved in Python itself?

comment:12 Changed 12 months ago by timo

  • Resolution set to needsinfo
  • Status changed from assigned to closed

Closing as needsinfo as it's not clear to me what we can do in Django to solve this issue and I haven't been able to get anyway to follow up in 3 months.

comment:13 Changed 9 months ago by rednaw

  • Cc gitaarik@… added

comment:14 Changed 5 months ago by claudep

  • Has patch set
  • Resolution needsinfo deleted
  • Status changed from closed to new
  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.7-beta-2 to master

I've encountered this bug on a production site. I'm suggesting to check for newlines in user-provided values, before formatting the header through Python API to workaround this issue.

https://github.com/django/django/pull/4163

comment:15 Changed 5 months ago by timgraham

  • Triage Stage changed from Accepted to Ready for checkin

comment:16 Changed 5 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In efb1f99f943b99624c78b91ff58502125490191e:

Fixed #20889 -- Prevented BadHeaderError when Python inserts newline

Workaround for http://bugs.python.org/issue20747.
In some corner cases, Python 2 inserts a newline in a header value
despite maxlinelen passed in Header constructor.
Thanks Tim Graham for the review.

comment:17 Changed 5 months ago by Claude Paroz <claude@…>

In 84e7fec88ddfc4178500a80d640728226d77317a:

[1.8.x] Fixed #20889 -- Prevented BadHeaderError when Python inserts newline

Workaround for http://bugs.python.org/issue20747.
In some corner cases, Python 2 inserts a newline in a header value
despite maxlinelen passed in Header constructor.
Thanks Tim Graham for the review.
Backport of efb1f99f94 from master.

Note: See TracTickets for help on using tickets.
Back to Top