#36119 closed Bug (fixed)
Attaching email file to email fails if the attachment is using 8bit Content-Transfer-Encoding
Reported by: | Trenton H | Owned by: | Gregory Mariani |
---|---|---|---|
Component: | Core (Mail) | Version: | 5.1 |
Severity: | Normal | Keywords: | compat32 |
Cc: | Mike Edmunds | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
If the attached email file is attached to an email, such as the code snipped below, the sending will fail due to the presence of non-ASCII content in the attachment.
email = EmailMessage( subject="subject", body="body", to="someone@somewhere.com", ) email.attach_file(original_file) n_messages = email.send()
Traceback (most recent call last): File "/usr/src/paperless/src/documents/signals/handlers.py", line 989, in email_action n_messages = email.send() ^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 301, in send return self.get_connection(fail_silently).send_messages([self]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 136, in send_messages sent = self._send(message) ^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 156, in _send from_email, recipients, message.as_bytes(linesep="\r\n") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 148, in as_bytes g.flatten(self, unixfrom=unixfrom, linesep=linesep) File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten self._write(msg) File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write self._dispatch(msg) File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch meth(msg) File "/usr/local/lib/python3.12/email/generator.py", line 286, in _handle_multipart g.flatten(part, unixfrom=False, linesep=self._NL) File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten self._write(msg) File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write self._dispatch(msg) File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch meth(msg) File "/usr/local/lib/python3.12/email/generator.py", line 372, in _handle_message g.flatten(msg.get_payload(0), unixfrom=False, linesep=self._NL) File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten self._write(msg) File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write self._dispatch(msg) File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch meth(msg) File "/usr/local/lib/python3.12/email/generator.py", line 446, in _handle_text super(BytesGenerator,self)._handle_text(msg) File "/usr/local/lib/python3.12/email/generator.py", line 263, in _handle_text self._write_lines(payload) File "/usr/local/lib/python3.12/email/generator.py", line 156, in _write_lines self.write(line) File "/usr/local/lib/python3.12/email/generator.py", line 420, in write self._fp.write(s.encode('ascii', 'surrogateescape')) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-15: ordinal not in range(128)
Attachments (1)
Change History (17)
by , 5 weeks ago
Attachment: | problem.eml added |
---|
comment:1 by , 5 weeks ago
comment:2 by , 5 weeks ago
Assuming SMTP is configured for a project:
from django.core.mail import EmailMessage email = EmailMessage( subject="subject", body="body", to="someone@somewhere.com", ) email.attach_file("problem.eml") # or email.attach_file("problem.eml", "message/rfc822") n_messages = email.send()
I would expect this to work without issue, but as shown above, Django appear to either assume ASCII or uses the standard library in a way that assumes ASCII. As a user, I would not expect to need extra steps or processing for this code to work without a crash and just attach the file instead.
I don't know about the internals to say how it would be fixed. Perhaps checking the headers of attached messages? Defaulting to utf-8 somewhere?
comment:3 by , 5 weeks ago
Cc: | added |
---|
comment:4 by , 5 weeks ago
Keywords: | compat32 added |
---|---|
Triage Stage: | Unreviewed → Accepted |
This came up in the forum. Here's a minimal case to reproduce:
from django.core.mail import EmailMessage # Content of message to attach, using 8bit CTE with raw utf-8: att = """\ Subject: attachment Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit ¡8-bit content! """.encode() email = EmailMessage(to=["test@example.com"]) email.attach("attachment.eml", att, "message/rfc822") email.message().as_bytes() # ...python3.12/email/generator.py", line 409, in write # self._fp.write(s.encode('ascii', 'surrogateescape')) # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # UnicodeEncodeError: 'ascii' codec can't encode character '\xa1' in position 0: ordinal not in range(128)
The problem is that EmailMessage._create_mime_attachment() uses message_from_string(force_str(content))
to convert the attachment content to a Python Message object. But message_from_string() doesn't properly handle Unicode characters. (See python/cpython#83565. This dates back to Python 2 when strings couldn't include Unicode.)
A working alternative is message_from_bytes():
from email import message_from_string, message_from_bytes message_from_string(att.decode()).as_bytes() # UnicodeEncodeError: 'ascii' codec can't encode character '\xa1' in position 0: ordinal not in range(128) message_from_bytes(att).as_bytes() # b'Subject: attachment\nContent-Type: text/plain; charset=utf-8\nContent-Transfer-Encoding: 8bit\n\n\xc2\xa18-bit content!\n'
So the simplest fix is probably to change Django to use message_from_bytes(force_bytes(content))
.
(This would also be fixed by upgrading Django to use Python's modern email APIs, #35581.)
comment:5 by , 5 weeks ago
I can have a try but #35581 looks like close to be released ? Any advice to start the work ?
comment:6 by , 5 weeks ago
#35581 is stalled waiting for some Python core library fixes, so won't get merged before Django 6.0 at the earliest. I think it's worth fixing this bug before then.
Maybe start by adding a new, failing test case in django/tests/mail/tests.py. (That's a long file, and not particularly well organized, but most of the attachment-related tests are kind of grouped together in the middle, so maybe somewhere near them.) You could probably even use the minimal example from my earlier comment. (Or make it even more minimal: neither to
nor Subject
are relevant to the bug.)
Once that test fails, I'm pretty sure you can fix it in django/core/mail/message.py by changing message_from_string(force_str(content))
to message_from_bytes(force_bytes(content))
(and fixing up the imports). And then running all the tests to see if that breaks anything else. (I wouldn't expect it to, but I'm often wrong, so it's good we have tests.)
comment:7 by , 5 weeks ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:8 by , 5 weeks ago
Patch should be done, CI is runnning and I can't stay awake to see the end
comment:9 by , 5 weeks ago
Has patch: | set |
---|
comment:10 by , 4 weeks ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:11 by , 3 weeks ago
Triage Stage: | Ready for checkin → Accepted |
---|
comment:12 by , 3 weeks ago
Patch needs improvement: | set |
---|
comment:14 by , 3 weeks ago
Triage Stage: | Accepted → Ready for checkin |
---|
Trenton, It would be helpful if you could provide some more information about this because I'm not sure we have any triagers who are experts in this area. Where is Django at fault and how can we fix it?