Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#5778 closed (fixed)

Email subjects not encoded properly

Reported by: Thomas Petazzoni <thomas.petazzoni@…> Owned by: nobody
Component: Core (Other) Version: dev
Severity: Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When providing an UTF-8 encoded subject to the EmailMessage class constructor, the subject is sent directly in UTF-8, without being encoded in Quoted-Printable or Base 64. When the MUA of the recepient is running on a machine with UTF-8, it "works", but with recipients having their machines running ISO-8859-x or other non-UTF-8 charset, the subject appears broken.

I think the problem comes from the implementation of the setitem method of the SafeMIMEText class. It only uses the Header() class when str(force_unicode(val)) raises an exception, which it doesn't do in my case (I suppose because my subject is properly UTF-8 encoded). However, I'd say it should *always* use Header(), which properly turns an UTF-8 string to a quoted-printable string.

I'm running Django trunk at r6526.

Don't hesitate to ask for further details if needed.

Attachments (1)

django-mail-encoding-header-fix (2.7 KB ) - added by Thomas Petazzoni <thomas.petazzoni@…> 16 years ago.
Ugly patch that fixes the problem for me

Download all attachments as: .zip

Change History (8)

by Thomas Petazzoni <thomas.petazzoni@…>, 16 years ago

Ugly patch that fixes the problem for me

comment:1 by Malcolm Tredinnick, 16 years ago

Triage Stage: UnreviewedAccepted

Yes. Good catch.

I'll have to check the behaviour of Header. This might need some tweaking from memory. The point is that when I was writing the current code, there were times when headers were being pointlessly encoded even when they could be represented directly (particularly ASCII text). So, providing it doesn't try to wrap ASCII up in anything fancy, this is the right fix. Otherwise, we need to check that the data really is non-ASCII before making a Header() out of it.

comment:2 by tpetazzoni, 16 years ago

You're right, it does some pointless encoding when the string is pure ASCII, for example:

From: =?utf-8?q?Trivialibre?= <trivialibre@…>

In that case, the =?utf-8?q? stuff is useless.

So, instead of verifying if the string is Unicode (with force_unicode), the code should probably check if the string is ASCII 7bits only or not. Do you want me to provide an improved fix, or are you going to do it ?

comment:3 by Malcolm Tredinnick, 16 years ago

I was thinking about this some more and I'm not sure I completely understand the problem any longer. force_unicode() forces the input to a unicode object and str() will raise an error for any data that isn't ASCII.

So, for example

>>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m'))    # A UTF-8 bytestring
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 0: ordinal not in range(128)

The only way I could see this failing is if somebody had changed Python's default encoding, which is well advertised as being something that shouldn't be done, for exactly this sort of reason.

What is an example of a header string that is causing the problem? And what does sys.getdefaultencoding() return?

comment:4 by tpetazzoni, 16 years ago

From a raw Python shell, with PYTHONPATH=/path/to/django:

thomas@toulibre:/srv/www/trivialibre.humanoidz.org$ python
Python 2.4.4 (#2, Apr  5 2007, 20:11:18) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print sys.getdefaultencoding()
ascii
>>> from django.utils.encoding import force_unicode
>>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc5' in position 0: ordinal not in range(128)
>>> str(force_unicode('Nouvelle question "Et ça marche bien é ?"'))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 22: ordinal not in range(128)

So here, it works properly. Now, from a Python shell ran using "manage.py shell", still with PYTHONPATH=/path/to/django/:

thomas@toulibre:/srv/www/trivialibre.humanoidz.org$ ./trivialibre/tvl/manage.py shell
Python 2.4.4 (#2, Apr  5 2007, 20:11:18) 
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import sys
>>> print sys.getdefaultencoding()
utf-8
>>> from django.utils.encoding import force_unicode
>>> str(force_unicode('\xc3\x85ngstr\xc3\xb6m'))
'\xc3\x85ngstr\xc3\xb6m'
>>> str(force_unicode('Nouvelle question "Et ça marche bien é ?"'))
'Nouvelle question "Et \xc3\xa7a marche bien \xc3\xa9 ?"'
>>> 

The second string tested above is the one I was using for my tests. But yours also perfectly shows the problem.

comment:5 by Malcolm Tredinnick, 16 years ago

Oh, that's tricky and not very nice behaviour. :-(

Okay, now I'm convinced. Will fix it in a minute.

comment:6 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: newclosed

(In [6551]) Fixed #5778 -- Changed the way we detect if a string is non-ASCII when creating
email headers. This fixes a problem that was showing up on some (but not all)
systems.

comment:7 by tpetazzoni, 16 years ago

Tested, works perfectly. Thanks!

Note: See TracTickets for help on using tickets.
Back to Top