id summary reporter owner description type status component version severity resolution keywords cc stage has_patch needs_docs needs_tests needs_better_patch easy ui_ux 22561 EmailMessage should respect RFC2822 on max line length notsqrt Henrik Levkowetz "Follow-up of thread [[https://groups.google.com/forum/#!topic/django-users/BdjFVVdX7QU|Email encoding (DKIM, long lines, etc..)]] on django-users == RFC == The [[https://tools.ietf.org/html/rfc2822#section-2.1.1|RFC2822]] states that: ""Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF."" This statement has not been modified in 2008 in the updated version : [[https://tools.ietf.org/html/rfc5322#section-2.1.1|RFC5322]] == History == For utf-8 encoded emails, Python uses: * shortest of ""quoted-printable"" and ""base64"" for the email subject * ""base64"" for the body {{{#!python # stdlib, identical in python2.7 and python3.3 : email/charset.py CHARSETS = { 'utf-8': (SHORTEST, BASE64, 'utf-8'), } }}} The historical reason seems to be that support for 8bit characters in emails was not largely adopted, hence the need to encode them into ASCII. Back in 2007, in ticket [[ticket:3472]] (changeset [changeset:5143]), it was decided to always use ""quoted-printable"", because using base64 seems to negatively affect spam scores. {{{#!python # Don't BASE64-encode UTF-8 messages so that we avoid unwanted attention from # some spam filters. Charset.add_charset('utf-8', Charset.SHORTEST, Charset.QP, 'utf-8') }}} In 2011, in ticket [ticket:11212] (changeset [changeset:16178], django 1.4), it was decided to remove ""quoted-printable"", and let python automatically switch between 7-bit or 8-bit encodings, based on the fact that 8-bit emails were widely supported, and MTAs were in charge of the downgrading to 7-bit if necessary. {{{#!python Charset.add_charset('utf-8', Charset.SHORTEST, None, 'utf-8') }}} The (unintended?) side-effect of using base64 or ""quoted-printable"" was in fact a guarantee to have short lines in emails (for instance, rfc for quoted-printable [[http://tools.ietf.org/html/rfc2045#page-20|rfc2045]] states that max-length is 76 characters). === Summary of invoqued reasons for these choices === * base64 is too big (bandwidth) * base64 is not supported by all clients * base64 has a negative effect on spam scores (cf [[http://wiki.apache.org/spamassassin/Rules/MIME_BASE64_TEXT|SpamAssassin's rule]] on **unnecessarily** using base64 encoding to disguise text, but this rule also states that ""This does not apply to text in the UTF-8 or big5 character sets."") * quoted-printable is no longer necessary, since MTAs and email clients have adopted 8bit support == Current state == === Django === There was an additional ticket [[ticket:12422]], but not relevant to this ticket. The current code in django/core/mail/message.py looks like: {{{#!python # Don't BASE64-encode UTF-8 messages so that we avoid unwanted attention from # some spam filters. utf8_charset = Charset.Charset('utf-8') utf8_charset.body_encoding = None # Python defaults to BASE64 }}} === Clients === Email clients like Gmail seem to wrap lines at 80 characters for text/plain, and switch to ""Content-Transfer-Encoding: quoted-printable"" for text/html and text/plain if there are non-ascii characters. == Importance == Mail Transfer Agent like Postfix often split lines that do not respect the RFC by inserting ""\r\n "" at the 998-th position of the line. DKIM signatures of emails are based on the unmodified body, but the signature validation by receivers is based on the modified body, resulting in a check failure. Apart from my own django projects, I have seen long lines in html emails sent by Sentry, for instance. == Choices == For reference, [[http://search.cpan.org/~rjbs/MIME-Lite-3.030/lib/MIME/Lite.pm#Construction|Perl library MIME-Lite]] recommends: {{{ Use encoding: | If your message contains: ------------------------------------------------------------ 7bit | Only 7-bit text, all lines <1000 characters 8bit | 8-bit text, all lines <1000 characters quoted-printable | 8-bit text or long lines (more reliable than ""8bit"") base64 | Largely non-textual data: a GIF, a tar file, etc. }}} One way or another, we have to guarantee that email lines are <1000 characters. base64 and quoted-printable do that for us. No using them means that we have to find a reliable way to split long lines into shorter ones, but the risk is to break html code in the case of text/html emails. I am not aware of other encodings that can be used for this, nor of reliable ways to split long lines. On django-users, Russ Magee warned about possible downstream consequences. == Other references == [[http://www.w3.org/Protocols/rfc1341/5_Content-Transfer-Encoding.html]] [[http://trac.edgewall.org/ticket/1754|relevant discussion on trac's trac]] [[http://wiki.apache.org/spamassassin/Rules/MIME_QP_LONG_LINE|SpamAssassin's rule]] on quoted-printable messages not respecting the 76-max line length rule. " Bug closed Core (Mail) dev Normal fixed petr.hroudny@… bugs@… michal@… Ready for checkin 1 0 0 0 0 0