Opened 9 years ago

Closed 6 years ago

Last modified 5 years ago

#26227 closed Bug (invalid)

Unicode attachment filename displays incorrectly in some clients

Reported by: Sergey Gornostaev Owned by: nobody
Component: Core (Mail) Version: 1.9
Severity: Normal Keywords: email attachment, filenames, i18n
Cc: Thomi Richards, milosu, Pablo Castellano Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When attaching a file with name containing non ASCII symbols, GMail display this attachment as "noname" and Zimbra 8.0.2 as percent-encoded.

from django.template.loader import get_template
from django.core.mail import send_mail, EmailMultiAlternatives
txt_msg_body = get_template('email.txt').render({})
html_msg_body = get_template('email.html').render({})
msg = EmailMultiAlternatives('Test', txt_msg_body, 'robot@somedomain.ru', ['sputterspark@gmail.com'])
msg.attach_alternative(html_msg_body, "text/html")
with open('test.pdf', 'rb') as fh:
    data = fh.read()
msg.attach(u'Имя файла', data, 'application/pdf')
msg.send()

Attachments (3)

GMailAndZimbra.png (32.5 KB ) - added by Sergey Gornostaev 9 years ago.
Emails screenshot
Screenshot_20160915_125209.png (69.7 KB ) - added by Filippe LeMarchand 8 years ago.
Django unicode filenames in Kmail
gmail-screenshot.png (5.0 KB ) - added by Tim Graham 6 years ago.

Download all attachments as: .zip

Change History (18)

by Sergey Gornostaev, 9 years ago

Attachment: GMailAndZimbra.png added

Emails screenshot

comment:1 by Moritz Sichert, 9 years ago

Resolution: invalid
Status: newclosed

Originally, being able to have unicode in attachment file names was added in ticket #14964.

I tested this:

from django.core.mail import EmailMultiAlternatives
msg = EmailMultiAlternatives('Test', 'email body\nend', 'from@example.com', ['to@example.com'])
msg.attach_alternative('<html><body>email body<br>end</body></html>', 'text/html')
msg.attach(u'fíle_with_ünicöde_çhårs', b'foobar', 'application/octet-stream')
msg.send()

and got following email body:

Content-Type: multipart/mixed; boundary="===============5134186686965449755=="
MIME-Version: 1.0
Subject: Test
From: from@example.com
To: to@example.com
Date: Wed, 17 Feb 2016 07:17:41 -0000
Message-ID: <some_number@myhost>

--===============5134186686965449755==
Content-Type: multipart/alternative;
 boundary="===============0773237926637752706=="
MIME-Version: 1.0

--===============0773237926637752706==
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit

email body
end
--===============0773237926637752706==
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: 7bit

<html><body>email body<br>end</body></html>
--===============0773237926637752706==--

--===============5134186686965449755==
Content-Type: application/octet-stream
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*="utf-8''f%C3%ADle_with_%C3%BCnic%C3%B6de_%C3%A7h%C3%A5rs"

Zm9vYmFy
--===============5134186686965449755==--

According to RFC 2231 that is encoded correctly. My MUA also displays it correctly, so it seems to be an error with Google and Zimbra.

in reply to:  1 comment:2 by Filippe LeMarchand, 8 years ago

Resolution: invalid
Status: closednew

Is this filename encoding really correct? I have same problem, and none of tried MUAs (including Gmail, Outlook and Kmail) doesn't show filenames correctly.

Last edited 8 years ago by Tim Graham (previous) (diff)

by Filippe LeMarchand, 8 years ago

Django unicode filenames in Kmail

comment:3 by Claude Paroz, 8 years ago

Just tested now on Thunderbird (Linux), Gmail and Roundcube, and the filename is displaying fine.

comment:4 by Tim Graham, 8 years ago

Resolution: invalid
Status: newclosed
Summary: Unicode attachment filenameUnicode attachment filename displays incorrectly in some clients

@gasinvein, please tell us where the bug is in Django if it's an issue.

in reply to:  3 comment:5 by Filippe LeMarchand, 8 years ago

Just tested now on Thunderbird (Linux), Gmail and Roundcube, and the filename is displaying fine.

So what am I doing wrong? Tried your eample in comment:1. Django 1.10.1

Last edited 8 years ago by Filippe LeMarchand (previous) (diff)

comment:6 by Thomi Richards, 6 years ago

Resolution: invalid
Status: closednew

Hi,

I came across this issue in django 1.11.11 - using the EmailMessage class, attachments with non-ascii characters in their filenames render as 'noname' in GMail.

I'm no expert in MIME - I've read RFC2231 and RFC2047, which seem to be on-topic for this case. However, the exact "correct" behaviour here isn't obvious to me. However, I was able to fix the issue like so:

class EmailMessageWithAttachmentEncoding(EmailMessage):
    def _create_attachment(self, filename, content, mimetype=None):
        attachment = self._create_mime_attachment(content, mimetype)
        if filename:
            try:
                parameters = {
                    'filename': filename.encode('ascii'),
                }
            except UnicodeEncodeError:
                # Include both parameters manually because Python's implementation
                # only adheres to RFC2231 and not RFC2047 which breaks some clients
                # such as GMail.
                filename = Header(filename, 'utf-8').encode()
                parameters = {
                    'filename*': filename,  # RFC2231
                    'filename': filename,  # RFC2047
                }
            attachment.add_header('Content-Disposition', 'attachment', **parameters)
        return attachment

I'm not sure if the django project would accept this as a patch, especially since it seems to me like the correct behaviour here is somewhat undefined (perhaps there's a MIME expert willing to testify?). In any case, this solution has worked for me, and might help others who stumble across this page while trying to debug the same issue.

I've re-opened the issue, since it seems like we probably want django's email features to work with GMail, even if the fix differs from what I've pasted above.

comment:7 by Thomi Richards, 6 years ago

Cc: Thomi Richards added

comment:8 by Tim Graham, 6 years ago

Resolution: needsinfo
Status: newclosed

What are the steps to reproduce the issue? I tried the steps in the ticket description and the attachment name looks fine. Also, please test with Django master (or at least Django 2.1 beta) rather than Django 1.11 which is quite old at this point.

comment:9 by Simon Charette, 6 years ago

Resolution: needsinfo
Status: closednew

I managed to reproduce by sending an email to a @gmail.com address with an attachment containing non-ASCII characters on master.

from django.core.mail import EmailMultiAlternatives
msg = EmailMultiAlternatives('Subject', 'Body', '...@gmail.com', [ '...@gmail.com'])
msg.attach('Имя файла', b'data', 'text/plain')
msg.send()

The issue seems to be that GMail ignores RFC2231 header parameters (e.g. filename*=) and only accepts RFC2047 ones (filename=?UTF...).

The code changes suggested by Thomi include both parameters if the attachment name is not ASCII encodable.

Last edited 6 years ago by Simon Charette (previous) (diff)

comment:10 by Tim Graham, 6 years ago

The name appears fine on the web version of gmail I'm using. I'll attach a screenshot with what I see.

by Tim Graham, 6 years ago

Attachment: gmail-screenshot.png added

comment:11 by Simon Charette, 6 years ago

Resolution: invalid
Status: newclosed

It looks like I can't reproduce against master anymore as the issue manifests itself on Python 2, sorry for the false alarm Tim.

Here's how the attachment is sent on Python 2

MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename*="utf-8''%D0%98%D0%BC%D1%8F%20%D1%84%D0%B0%D0%B9%D0%BB%D0%B0"

data

And on Python 3

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
 filename*=utf-8''%D0%98%D0%BC%D1%8F%20%D1%84%D0%B0%D0%B9%D0%BB%D0%B0

data

Notice that both use a RFC 2231 filename*= parameter but the value is within double quotes on Python 2 while it isn't on Python 3. That seems to be the reason why GMail rejects the encoded value.

This was changed in Python 3.1 dfd7eb and detailed in CPython#1693546

comment:12 by milosu, 5 years ago

To be honest, I'm still having the "noname" problem in GMail, when some utf-8 characters are present in the filename.

I can see that the e-mail I'm sending does not have the double quotes around filename.

But my application is behind two Microsoft SMTP Servers (internal and outbound) and it looks like one of them will silently add the double quotes before sending the message to Google Gmail.

That being said, what works for me is really the patch to _create_attachment method as proposed by Thomi in Comment No. 6.

With his patch applied, the raw e-mail when received by Google looks like:

--===============0157380707==
Content-Type: application/pdf
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="gartner_ěščřšěřšěčéříš909.pdf"


--===============0157380707==--

and the filename will be displayed correctly.

Without the patch, Google will display no-name filename and the headers look like:

--===============2001112103==
Content-Type: application/pdf
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename*="utf-8''gartner_%C4%9B%C5%A1%C4%8D%C5%99%C5%A1%C4%9B%C5%99%C5%A1%C4%9B%C4%8D%C3%A9%C5%99%C3%AD%C5%A1909.pdf"

While the raw e-mail when generated by Django without the patch looks like:

--===============0135089781==
Content-Type: application/pdf
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
 filename*=utf-8''gartner_%C4%9B%C5%A1%C4%8D%C5%99%C5%A1%C4%9B%C5%99%C5%A1%C4%9B%C4%8D%C3%A9%C5%99%C3%AD%C5%A1909.pdf

So somehow Microsoft SMTP Server or some third-party filter adds the double quotes during processing.

The error does not happen when sending the same e-mail via Postfix. Looks like tricky interoperability problem indeed..

Thank you Thomi anyway..

Version 0, edited 5 years ago by milosu (next)

comment:13 by milosu, 5 years ago

Cc: milosu added

comment:14 by Pablo Castellano, 5 years ago

For what it's worth, K-9 email app in Android also displays it wrong when the filename contains any non-ascii character.
https://i.imgur.com/f4vzKPJ.jpg

In my case I have workarounded it removing accents like

import unidecode
filename = unidecode.unidecode(filename)
Last edited 5 years ago by Pablo Castellano (previous) (diff)

comment:15 by Pablo Castellano, 5 years ago

Cc: Pablo Castellano added
Note: See TracTickets for help on using tickets.
Back to Top