#20889 closed Bug (fixed)
HttpResponseBase._convert_to_charset complains about newline it inserted itself
Reported by: | Owned by: | Rik | |
---|---|---|---|
Component: | HTTP handling | Version: | dev |
Severity: | Normal | Keywords: | nlsprint14 |
Cc: | poly, gitaarik@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Consider the following snippet:
# coding=utf-8 from django.http import HttpResponse h = HttpResponse() f = 'zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz a\xcc\x88'.decode('utf-8') h['Content-Disposition'] = u'attachment; filename="%s"' % f
This contains a "OSX-Style" Umlaut (a + ¨), which triggers the "convert to mime encoding" path of _convert_to_charset. However, this string is so long that email.header.Header, which is used to mime-encode, will split the line. Later _convert_to_charset explodes because the header now contains a newline.
$ ./manage.py shell Python 2.7.5 (default, Aug 1 2013, 01:01:17) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> import ttt Traceback (most recent call last): File "<console>", line 1, in <module> File "/Users/mjl/Projects/cms_sites/cc.intranet/cms/ttt.py", line 5, in <module> h['Content-Disposition'] = u'attachment; filename="%s"' % f File "/Users/mjl/Projects/cms_sites/cc.intranet/lib/python2.7/site-packages/django/http/response.py", line 110, in __setitem__ value = self._convert_to_charset(value, 'latin1', mime_encode=True) File "/Users/mjl/Projects/cms_sites/cc.intranet/lib/python2.7/site-packages/django/http/response.py", line 105, in _convert_to_charset raise BadHeaderError("Header values can't contain newlines (got %r)" % value) BadHeaderError: Header values can't contain newlines (got '=?utf-8?q?attachment=3B_filename=3D=22zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz_a?=\n =?utf-8?b?zIgi?=')
Solution: django/http/response.py#L163 should probably use
Header(value, 'utf-8', maxlinelen=1000)
to avoid generating continuation lines. Agreed, the "1000" is somewhat arbitrary, but realistically this fixes the problem.
Change History (17)
comment:1 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:4 by , 11 years ago
Cc: | added |
---|---|
Resolution: | fixed |
Status: | closed → new |
I'm afraid this annoying bug hasn't gone, see my report on #21620.
comment:5 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:6 by , 11 years ago
Keywords: | nlsprint14 added |
---|
comment:7 by , 11 years ago
The bug you are encountering poly, is (I think) because of a bug in the Python standard library that has never been solved.
The bug was reported over here:
http://bugs.python.org/issue12649
However, it was closed for some reason. I can still reproduce the bug with this snippet:
from email.header import Header import sys print Header( u'attachment; filename="ajdsklfj klasdjfkl asdjfkl jadsfja sdflkads fad fads adsf dasjfkl jadslkfj dlasf asd \u6211\u6211\u6211 jo \u6211\u6211 jo \u6211\u6211"', charset='utf-8', maxlinelen=sys.maxsize ).encode()
which output is:
=?utf-8?b?YXR0YWNobWVudDsgZmlsZW5hbWU9ImFqZHNrbGZqIGtsYXNkamZrbCBhc2RqZmts?= =?utf-8?b?IGphZHNmamEgc2RmbGthZHMgZmFkIGZhZHMgYWRzZiBkYXNqZmtsIGphZHNsa2Zq?= =?utf-8?b?IGRsYXNmIGFzZCDmiJHmiJHmiJEgam8g5oiR5oiRIGpvIOaIkeaIkSI=?=
So it's because there's one character in the header that can't be encoded to latin-1
which raises a UnicodeError
, which will be catched and will be encoded using the Header
class from the python standard library in email.header
.
So this happens over here:
https://github.com/django/django/blob/master/django/http/response.py#L169
Well, it correctly gives the maxlinelen
. Now, what goes wrong in there?
If you dive in to the code, I tackled it down to the _encode_chunks
the Header
class. This method takes the maxlinelen
argument (which is ultimately based on the value initially provided in the init method in the code I posted above). However, it doesn't give it to the header_encode
method that is used in this piece of code in this _encode_chunks
method:
if charset is None or charset.header_encoding is None: s = header else: s = charset.header_encode(header)
So if charset.header_encoding
is set it uses this method to encode the header, but then the maxlinelen
is lost. This is where it goes wrong.
I'll keep this ticket assigned to me as I'm probably going on with it tomorrow.
comment:8 by , 11 years ago
I just created a ticket in the Python bugtracker about this:
http://bugs.python.org/issue20747
comment:9 by , 11 years ago
Easy pickings: | unset |
---|
comment:10 by , 11 years ago
Actually, I'm not sure that the check for newline is actually correct. There is no problem with a newline in a header as long as it's followed by white space, which means it's a continuation line and will be folded back at the other end.
I understand where that check comes from though, there have been too many header injections out there.
comment:11 by , 11 years ago
Has patch: | unset |
---|---|
Type: | Uncategorized → Bug |
Version: | 1.5 → 1.7-beta-2 |
Is there any harm in leaving the original fix in 1.7? Should we leave this ticket open or is it something that should by solved in Python itself?
comment:12 by , 10 years ago
Resolution: | → needsinfo |
---|---|
Status: | assigned → closed |
Closing as needsinfo as it's not clear to me what we can do in Django to solve this issue and I haven't been able to get anyway to follow up in 3 months.
comment:13 by , 10 years ago
Cc: | added |
---|
comment:14 by , 10 years ago
Has patch: | set |
---|---|
Resolution: | needsinfo |
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
Version: | 1.7-beta-2 → master |
I've encountered this bug on a production site. I'm suggesting to check for newlines in user-provided values, before formatting the header through Python API to workaround this issue.
comment:15 by , 10 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:16 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
In 6dca603abb0eb164ba87657caf5cc65bca449719: