#36119 closed Bug (fixed)
Attaching email file to email fails if the attachment is using 8bit Content-Transfer-Encoding
| Reported by: | Trenton H | Owned by: | Gregory Mariani |
|---|---|---|---|
| Component: | Core (Mail) | Version: | 5.1 |
| Severity: | Normal | Keywords: | compat32 |
| Cc: | Mike Edmunds | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
If the attached email file is attached to an email, such as the code snipped below, the sending will fail due to the presence of non-ASCII content in the attachment.
email = EmailMessage( subject="subject", body="body", to="someone@somewhere.com", ) email.attach_file(original_file) n_messages = email.send()
Traceback (most recent call last):
File "/usr/src/paperless/src/documents/signals/handlers.py", line 989, in email_action
n_messages = email.send()
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 301, in send
return self.get_connection(fail_silently).send_messages([self])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 136, in send_messages
sent = self._send(message)
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/core/mail/backends/smtp.py", line 156, in _send
from_email, recipients, message.as_bytes(linesep="\r\n")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/django/core/mail/message.py", line 148, in as_bytes
g.flatten(self, unixfrom=unixfrom, linesep=linesep)
File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
self._write(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
self._dispatch(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
meth(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 286, in _handle_multipart
g.flatten(part, unixfrom=False, linesep=self._NL)
File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
self._write(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
self._dispatch(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
meth(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 372, in _handle_message
g.flatten(msg.get_payload(0), unixfrom=False, linesep=self._NL)
File "/usr/local/lib/python3.12/email/generator.py", line 117, in flatten
self._write(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 182, in _write
self._dispatch(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 219, in _dispatch
meth(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 446, in _handle_text
super(BytesGenerator,self)._handle_text(msg)
File "/usr/local/lib/python3.12/email/generator.py", line 263, in _handle_text
self._write_lines(payload)
File "/usr/local/lib/python3.12/email/generator.py", line 156, in _write_lines
self.write(line)
File "/usr/local/lib/python3.12/email/generator.py", line 420, in write
self._fp.write(s.encode('ascii', 'surrogateescape'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-15: ordinal not in range(128)
Attachments (1)
Change History (17)
by , 10 months ago
| Attachment: | problem.eml added |
|---|
comment:1 by , 10 months ago
comment:2 by , 10 months ago
Assuming SMTP is configured for a project:
from django.core.mail import EmailMessage email = EmailMessage( subject="subject", body="body", to="someone@somewhere.com", ) email.attach_file("problem.eml") # or email.attach_file("problem.eml", "message/rfc822") n_messages = email.send()
I would expect this to work without issue, but as shown above, Django appear to either assume ASCII or uses the standard library in a way that assumes ASCII. As a user, I would not expect to need extra steps or processing for this code to work without a crash and just attach the file instead.
I don't know about the internals to say how it would be fixed. Perhaps checking the headers of attached messages? Defaulting to utf-8 somewhere?
comment:3 by , 10 months ago
| Cc: | added |
|---|
comment:4 by , 10 months ago
| Keywords: | compat32 added |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
This came up in the forum. Here's a minimal case to reproduce:
from django.core.mail import EmailMessage # Content of message to attach, using 8bit CTE with raw utf-8: att = """\ Subject: attachment Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit ¡8-bit content! """.encode() email = EmailMessage(to=["test@example.com"]) email.attach("attachment.eml", att, "message/rfc822") email.message().as_bytes() # ...python3.12/email/generator.py", line 409, in write # self._fp.write(s.encode('ascii', 'surrogateescape')) # ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # UnicodeEncodeError: 'ascii' codec can't encode character '\xa1' in position 0: ordinal not in range(128)
The problem is that EmailMessage._create_mime_attachment() uses message_from_string(force_str(content)) to convert the attachment content to a Python Message object. But message_from_string() doesn't properly handle Unicode characters. (See python/cpython#83565. This dates back to Python 2 when strings couldn't include Unicode.)
A working alternative is message_from_bytes():
from email import message_from_string, message_from_bytes message_from_string(att.decode()).as_bytes() # UnicodeEncodeError: 'ascii' codec can't encode character '\xa1' in position 0: ordinal not in range(128) message_from_bytes(att).as_bytes() # b'Subject: attachment\nContent-Type: text/plain; charset=utf-8\nContent-Transfer-Encoding: 8bit\n\n\xc2\xa18-bit content!\n'
So the simplest fix is probably to change Django to use message_from_bytes(force_bytes(content)).
(This would also be fixed by upgrading Django to use Python's modern email APIs, #35581.)
comment:5 by , 10 months ago
I can have a try but #35581 looks like close to be released ? Any advice to start the work ?
comment:6 by , 10 months ago
#35581 is stalled waiting for some Python core library fixes, so won't get merged before Django 6.0 at the earliest. I think it's worth fixing this bug before then.
Maybe start by adding a new, failing test case in django/tests/mail/tests.py. (That's a long file, and not particularly well organized, but most of the attachment-related tests are kind of grouped together in the middle, so maybe somewhere near them.) You could probably even use the minimal example from my earlier comment. (Or make it even more minimal: neither to nor Subject are relevant to the bug.)
Once that test fails, I'm pretty sure you can fix it in django/core/mail/message.py by changing message_from_string(force_str(content)) to message_from_bytes(force_bytes(content)) (and fixing up the imports). And then running all the tests to see if that breaks anything else. (I wouldn't expect it to, but I'm often wrong, so it's good we have tests.)
comment:7 by , 10 months ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:8 by , 10 months ago
Patch should be done, CI is runnning and I can't stay awake to see the end
comment:9 by , 10 months ago
| Has patch: | set |
|---|
comment:10 by , 9 months ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
comment:11 by , 9 months ago
| Triage Stage: | Ready for checkin → Accepted |
|---|
comment:12 by , 9 months ago
| Patch needs improvement: | set |
|---|
comment:14 by , 9 months ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
Trenton, It would be helpful if you could provide some more information about this because I'm not sure we have any triagers who are experts in this area. Where is Django at fault and how can we fix it?