Opened 14 years ago

Closed 14 years ago

Last modified 14 years ago

#13831 closed (invalid)

UTF-8 in models.__repr__ causes hard to track down unicode errors.

Reported by: Walter Doekes Owned by: nobody
Component: Uncategorized Version: 1.2
Severity: Keywords:
Cc: Walter Doekes Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

UTF-8 in models.repr causes hard to track down unicode errors.

Normally you require an explicit conversion to UTF-8 if you pipe the
output of a python command to a different program.

$ echo "print u'\\u20ac'" | ./manage.py shell
€
$ echo "print u'\\u20ac'" | PYTHONPATH=/opt/django12 ./manage.py shell | cat
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

This is expected.

So, to "fix" that, I include a recoder on stdout. I do this for every
call to a django.core.management.base.BaseCommand:

# Replace stdout with a recoder that uses the default locale
lang, encoding = locale.getdefaultlocale()
if encoding:
    if sys.stdout.name == '<stdout>': # only mess with the original
        sys.stdout = codecs.getwriter(encoding)(
            # Reopen stdout in unbuffered mode
            os.fdopen(sys.stdout.fileno(), 'w', 0),
            'replace'
        )

That works fine too. But now things start to break when using repr:

$ cat models.py
from django.db import models

class A(models.Model):
    def __unicode__(self):
        return u'\u20ac' # EUR
>>> import sys, codecs
>>> sys.stdout = codecs.getwriter('utf-8')(sys.stdout, 'replace')
>>> from myproject.models import A
>>> a = A()
>>> repr(a)
'<A: \xe2\x82\xac>'
>>> print repr(a)
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

If this A instance is part of a larger set of items (e.g. a dictionary) on which you're not even explicitly calling repr(), it becomes increasingly difficult to see why on earth one is getting encoding errors.

Is there something wrong with my codecs.getwriter replacement or is it
wrong that django returns a non-ascii (utf-8) bytestring for repr()?

Regards,
Walter Doekes
OSSO B.V.

Change History (2)

comment:1 by Luke Plant, 14 years ago

Resolution: invalid
Status: newclosed

I cannot find anywhere that says it is incorrect to return non-ascii from __repr__, and this is not the place to discuss any problems with your stdout recoder. As there is nothing in this bug report that is specific to Django, so I'm going to have to close as INVALID unless you can show that Django is doing something wrong. If there are some management commands that are printing unicode to sys.stdout that probably needs to be fixed - please open another bug.

Thanks!

comment:2 by Walter Doekes, 14 years ago

I suppose you're right. Thanks :)

For those experiencing the same issue, switch to this:

        # Replace stdout with a recoder that uses UTF-8 (like all of django uses)
        if sys.stdout.name == '<stdout>': # only mess with the original
            from encodings.utf_8 import StreamWriter
            class LaxStreamWriter(StreamWriter):
                def encode(file, string, errors):
                    if isinstance(string, str):
                        return (string, 1)
                    return StreamWriter.encode(string, errors)
            sys.stdout = LaxStreamWriter(os.fdopen(sys.stdout.fileno(), 'w', 0))
Note: See TracTickets for help on using tickets.
Back to Top