Opened 8 years ago

Closed 8 years ago

#27007 closed Cleanup/optimization (fixed)

Handle non-UTF-8 bytes objects for text/* attachments

Reported by: Michael Schwarz Owned by: nobody
Component: Core (Mail) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I've noticed that that attach() from django.core.mail.EmailMessage allows a bytes object to be passed as content even when the MIME type is set to text/plain. SafeMIMEText will then try to decode the text in the bytes object as UTF-8, which is necessary because the MIME specifications require text/* parts to be decodable with the encoding specified in the Content-Type header. But decoding the bytes object will fail if the bytes object does not contain valid UTF-8 encoded text.

OTOH, attach_file() will in that case first try to decode the file's content but fall back on treating the attachment as binary, setting the Content-Type to application/octet-stream.

I think the fallback provided by attach_file() is useful and I would like to extend attach() to that same behavior.

Change History (9)

comment:1 by Michael Schwarz, 8 years ago

I will shortly create a pull request with my proposed changes.

comment:2 by Michael Schwarz, 8 years ago

Summary: Allow bytes objects as text/* attachmentsHandle non-UTF-8 bytes objects for text/* attachments

comment:3 by Tim Graham, 8 years ago

Might be a duplicate of #26802 which was committed just a couple weeks ago.

comment:5 by Michael Schwarz, 8 years ago

Replying to timgraham:

Sorry for not mentioning #26802. That change did improve the situation, allowing bytes objects. But it tries to unconditionally decode it as UTF-8 (compared to the logic in attach_file(), which falls back to treating it as a binary file). This means that if a bytes object is passed which does not contain valid UTF-8, a UnicodeDecodeError will be thrown.

With the changes in the pull request, the behavior is the same as when attaching a non-UTF-8 file.

comment:6 by Claude Paroz, 8 years ago

Has patch: set
Patch needs improvement: set
Triage Stage: UnreviewedAccepted

comment:7 by Claude Paroz <claude@…>, 8 years ago

In 6fe391d4:

Refs #27007 -- Enhanced mail text attachment test

The test now also checks whether the sent message's attachment has the expected
name, content and mime type.

comment:8 by Tim Graham, 8 years ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

Pending some small documentation edits.

comment:9 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: newclosed

In 72d541b:

Fixed #27007 -- Handled non-UTF-8 bytes objects for text/* attachments.

The fallback logic which allows non-UTF-8 encoded files to be passed to
attach_file() even when a text/* mime type has been specified is
moved to attach(). Both functions now fall back to a content type of
application/octet-stream.

A side effect is that a file's content is decoded in memory instead of
opening it in text mode and reading it into a string.

Some mimetype-related logic in _create_attachment() has become
obsolete as the code moved from attach_file() to attach() already
handles this.

Note: See TracTickets for help on using tickets.
Back to Top