id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
13831	UTF-8 in models.__repr__ causes hard to track down unicode errors.	Walter Doekes	nobody	"UTF-8 in models.__repr__ causes hard to track down unicode errors.

Normally you require an explicit conversion to UTF-8 if you pipe the
output of a python command to a different program.
{{{
$ echo ""print u'\\u20ac'"" | ./manage.py shell
€
$ echo ""print u'\\u20ac'"" | PYTHONPATH=/opt/django12 ./manage.py shell | cat
Traceback (most recent call last):
  File ""<ipython console>"", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
}}}

This is expected.

So, to ""fix"" that, I include a recoder on stdout. I do this for every
call to a django.core.management.base.BaseCommand:
{{{
# Replace stdout with a recoder that uses the default locale
lang, encoding = locale.getdefaultlocale()
if encoding:
    if sys.stdout.name == '<stdout>': # only mess with the original
        sys.stdout = codecs.getwriter(encoding)(
            # Reopen stdout in unbuffered mode
            os.fdopen(sys.stdout.fileno(), 'w', 0),
            'replace'
        )
}}}

That works fine too. But now things start to break when using repr:
{{{
$ cat models.py
from django.db import models

class A(models.Model):
    def __unicode__(self):
        return u'\u20ac' # EUR
}}}
{{{
>>> import sys, codecs
>>> sys.stdout = codecs.getwriter('utf-8')(sys.stdout, 'replace')
>>> from myproject.models import A
>>> a = A()
>>> repr(a)
'<A: \xe2\x82\xac>'
>>> print repr(a)
------------------------------------------------------------
Traceback (most recent call last):
  File ""<ipython console>"", line 1, in <module>
  File ""/usr/lib/python2.6/codecs.py"", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

}}}
If this A instance is part of a larger set of items (e.g. a dictionary) on which you're not even explicitly calling repr(), it becomes increasingly difficult to see why on earth one is getting encoding errors.

Is there something wrong with my codecs.getwriter replacement or is it
wrong that django returns a non-ascii (utf-8) bytestring for repr()?


Regards,[[br]]
Walter Doekes[[br]]
OSSO B.V."		closed	Uncategorized	1.2		invalid		Walter Doekes	Unreviewed	0	0	0	0	0	0