Opened 6 years ago

Closed 4 years ago

Last modified 21 months ago

#26227 closed Bug (invalid)

Unicode attachment filename displays incorrectly in some clients

Reported by: Sergey Gornostaev Owned by: nobody
Component: Core (Mail) Version: 1.9
Severity: Normal Keywords: email attachment, filenames, i18n
Cc: Thomi Richards, milosu, Pablo Castellano Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When attaching a file with name containing non ASCII symbols, GMail display this attachment as "noname" and Zimbra 8.0.2 as percent-encoded.

from django.template.loader import get_template
from django.core.mail import send_mail, EmailMultiAlternatives
txt_msg_body = get_template('email.txt').render({})
html_msg_body = get_template('email.html').render({})
msg = EmailMultiAlternatives('Test', txt_msg_body, 'robot@somedomain.ru', ['sputterspark@gmail.com'])
msg.attach_alternative(html_msg_body, "text/html")
with open('test.pdf', 'rb') as fh:
    data = fh.read()
msg.attach(u'Имя файла', data, 'application/pdf')
msg.send()

Attachments (3)

GMailAndZimbra.png (32.5 KB) - added by Sergey Gornostaev 6 years ago.
Emails screenshot
Screenshot_20160915_125209.png (69.7 KB) - added by Filippe LeMarchand 5 years ago.
Django unicode filenames in Kmail
gmail-screenshot.png (5.0 KB) - added by Tim Graham 4 years ago.

Download all attachments as: .zip

Change History (18)

Changed 6 years ago by Sergey Gornostaev

Attachment: GMailAndZimbra.png added

Emails screenshot

comment:1 Changed 6 years ago by Moritz Sichert

Resolution: invalid
Status: newclosed

Originally, being able to have unicode in attachment file names was added in ticket #14964.

I tested this:

from django.core.mail import EmailMultiAlternatives
msg = EmailMultiAlternatives('Test', 'email body\nend', 'from@example.com', ['to@example.com'])
msg.attach_alternative('<html><body>email body<br>end</body></html>', 'text/html')
msg.attach(u'fíle_with_ünicöde_çhårs', b'foobar', 'application/octet-stream')
msg.send()

and got following email body:

Content-Type: multipart/mixed; boundary="===============5134186686965449755=="
MIME-Version: 1.0
Subject: Test
From: from@example.com
To: to@example.com
Date: Wed, 17 Feb 2016 07:17:41 -0000
Message-ID: <some_number@myhost>

--===============5134186686965449755==
Content-Type: multipart/alternative;
 boundary="===============0773237926637752706=="
MIME-Version: 1.0

--===============0773237926637752706==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

email body
end
--===============0773237926637752706==
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit

<html><body>email body<br>end</body></html>
--===============0773237926637752706==--

--===============5134186686965449755==
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*="utf-8''f%C3%ADle_with_%C3%BCnic%C3%B6de_%C3%A7h%C3%A5rs"

Zm9vYmFy
--===============5134186686965449755==--

According to RFC 2231 that is encoded correctly. My MUA also displays it correctly, so it seems to be an error with Google and Zimbra.

comment:2 in reply to:  1 Changed 5 years ago by Filippe LeMarchand

Resolution: invalid
Status: closednew

Is this filename encoding really correct? I have same problem, and none of tried MUAs (including Gmail, Outlook and Kmail) doesn't show filenames correctly.

Last edited 5 years ago by Tim Graham (previous) (diff)

Changed 5 years ago by Filippe LeMarchand

Django unicode filenames in Kmail

comment:3 Changed 5 years ago by Claude Paroz

Just tested now on Thunderbird (Linux), Gmail and Roundcube, and the filename is displaying fine.

comment:4 Changed 5 years ago by Tim Graham

Resolution: invalid
Status: newclosed
Summary: Unicode attachment filenameUnicode attachment filename displays incorrectly in some clients

@gasinvein, please tell us where the bug is in Django if it's an issue.

comment:5 in reply to:  3 Changed 5 years ago by Filippe LeMarchand

Just tested now on Thunderbird (Linux), Gmail and Roundcube, and the filename is displaying fine.

So what am I doing wrong? Tried your eample in comment:1. Django 1.10.1

Last edited 5 years ago by Filippe LeMarchand (previous) (diff)

comment:6 Changed 4 years ago by Thomi Richards

Resolution: invalid
Status: closednew

Hi,

I came across this issue in django 1.11.11 - using the EmailMessage class, attachments with non-ascii characters in their filenames render as 'noname' in GMail.

I'm no expert in MIME - I've read RFC2231 and RFC2047, which seem to be on-topic for this case. However, the exact "correct" behaviour here isn't obvious to me. However, I was able to fix the issue like so:

class EmailMessageWithAttachmentEncoding(EmailMessage):
    def _create_attachment(self, filename, content, mimetype=None):
        attachment = self._create_mime_attachment(content, mimetype)
        if filename:
            try:
                parameters = {
                    'filename': filename.encode('ascii'),
                }
            except UnicodeEncodeError:
                # Include both parameters manually because Python's implementation
                # only adheres to RFC2231 and not RFC2047 which breaks some clients
                # such as GMail.
                filename = Header(filename, 'utf-8').encode()
                parameters = {
                    'filename*': filename,  # RFC2231
                    'filename': filename,  # RFC2047
                }
            attachment.add_header('Content-Disposition', 'attachment', **parameters)
        return attachment

I'm not sure if the django project would accept this as a patch, especially since it seems to me like the correct behaviour here is somewhat undefined (perhaps there's a MIME expert willing to testify?). In any case, this solution has worked for me, and might help others who stumble across this page while trying to debug the same issue.

I've re-opened the issue, since it seems like we probably want django's email features to work with GMail, even if the fix differs from what I've pasted above.

comment:7 Changed 4 years ago by Thomi Richards

Cc: Thomi Richards added

comment:8 Changed 4 years ago by Tim Graham

Resolution: needsinfo
Status: newclosed

What are the steps to reproduce the issue? I tried the steps in the ticket description and the attachment name looks fine. Also, please test with Django master (or at least Django 2.1 beta) rather than Django 1.11 which is quite old at this point.

comment:9 Changed 4 years ago by Simon Charette

Resolution: needsinfo
Status: closednew

I managed to reproduce by sending an email to a @gmail.com address with an attachment containing non-ASCII characters on master.

from django.core.mail import EmailMultiAlternatives
msg = EmailMultiAlternatives('Subject', 'Body', '...@gmail.com', [ '...@gmail.com'])
msg.attach('Имя файла', b'data', 'text/plain')
msg.send()

The issue seems to be that GMail ignores RFC2231 header parameters (e.g. filename*=) and only accepts RFC2047 ones (filename=?UTF...).

The code changes suggested by Thomi include both parameters if the attachment name is not ASCII encodable.

Last edited 4 years ago by Simon Charette (previous) (diff)

comment:10 Changed 4 years ago by Tim Graham

The name appears fine on the web version of gmail I'm using. I'll attach a screenshot with what I see.

Changed 4 years ago by Tim Graham

Attachment: gmail-screenshot.png added

comment:11 Changed 4 years ago by Simon Charette

Resolution: invalid
Status: newclosed

It looks like I can't reproduce against master anymore as the issue manifests itself on Python 2, sorry for the false alarm Tim.

Here's how the attachment is sent on Python 2

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename*="utf-8''%D0%98%D0%BC%D1%8F%20%D1%84%D0%B0%D0%B9%D0%BB%D0%B0"

data

And on Python 3

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename*=utf-8''%D0%98%D0%BC%D1%8F%20%D1%84%D0%B0%D0%B9%D0%BB%D0%B0

data

Notice that both use a RFC 2231 filename*= parameter but the value is within double quotes on Python 2 while it isn't on Python 3. That seems to be the reason why GMail rejects the encoded value.

This was changed in Python 3.1 dfd7eb and detailed in CPython#1693546

comment:12 Changed 22 months ago by milosu

To be honest, I'm still having the "noname" problem in GMail, when some utf-8 characters are present in the filename.

I can see that the e-mail I'm sending does not have the double quotes around filename.

But my application is behind two Microsoft SMTP Servers (internal and outbound) and it looks like one of them will silently add the double quotes before sending the message to Google Gmail.

That being said, what works for me is really the patch to _create_attachment method as proposed by Thomi in Comment No. 6.

With his patch applied, the raw e-mail when received by GMail looks like:

--===============0390081516==
Content-Type: application/pdf
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename="=?utf-8?b?Z2FydG5lcl/Em8WhxI3FmcWhxJvFmcWhxJvEjcOpxZnDrcWhOTA5LnBkZg==?="

and the filename will be displayed correctly.

Without the patch, GMail will display no-name filename and the headers look like:

--===============2001112103==
Content-Type: application/pdf
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename*="utf-8''gartner_%C4%9B%C5%A1%C4%8D%C5%99%C5%A1%C4%9B%C5%99%C5%A1%C4%9B%C4%8D%C3%A9%C5%99%C3%AD%C5%A1909.pdf"

While the raw e-mail when generated by Django without the patch looks like:

--===============0135089781==
Content-Type: application/pdf
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*=utf-8''gartner_%C4%9B%C5%A1%C4%8D%C5%99%C5%A1%C4%9B%C5%99%C5%A1%C4%9B%C4%8D%C3%A9%C5%99%C3%AD%C5%A1909.pdf

So somehow Microsoft SMTP Server or some third-party filter adds the double quotes during processing.

The error does not happen when sending the same e-mail via Postfix. Looks like tricky interoperability problem indeed..

Thank you Thomi anyway..

Last edited 22 months ago by milosu (previous) (diff)

comment:13 Changed 22 months ago by milosu

Cc: milosu added

comment:14 Changed 21 months ago by Pablo Castellano

For what it's worth, K-9 email app in Android also displays it wrong when the filename contains any non-ascii character.
https://i.imgur.com/f4vzKPJ.jpg

In my case I have workarounded it removing accents like

import unidecode
filename = unidecode.unidecode(filename)
Last edited 21 months ago by Pablo Castellano (previous) (diff)

comment:15 Changed 21 months ago by Pablo Castellano

Cc: Pablo Castellano added
Note: See TracTickets for help on using tickets.
Back to Top