Code

Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#13831 closed (invalid)

UTF-8 in models.__repr__ causes hard to track down unicode errors.

Reported by: wdoekes Owned by: nobody
Component: Uncategorized Version: 1.2
Severity: Keywords:
Cc: wdoekes Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

UTF-8 in models.repr causes hard to track down unicode errors.

Normally you require an explicit conversion to UTF-8 if you pipe the
output of a python command to a different program.

$ echo "print u'\\u20ac'" | ./manage.py shell
€
$ echo "print u'\\u20ac'" | PYTHONPATH=/opt/django12 ./manage.py shell | cat
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

This is expected.

So, to "fix" that, I include a recoder on stdout. I do this for every
call to a django.core.management.base.BaseCommand:

# Replace stdout with a recoder that uses the default locale
lang, encoding = locale.getdefaultlocale()
if encoding:
    if sys.stdout.name == '<stdout>': # only mess with the original
        sys.stdout = codecs.getwriter(encoding)(
            # Reopen stdout in unbuffered mode
            os.fdopen(sys.stdout.fileno(), 'w', 0),
            'replace'
        )

That works fine too. But now things start to break when using repr:

$ cat models.py
from django.db import models

class A(models.Model):
    def __unicode__(self):
        return u'\u20ac' # EUR
>>> import sys, codecs
>>> sys.stdout = codecs.getwriter('utf-8')(sys.stdout, 'replace')
>>> from myproject.models import A
>>> a = A()
>>> repr(a)
'<A: \xe2\x82\xac>'
>>> print repr(a)
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "/usr/lib/python2.6/codecs.py", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

If this A instance is part of a larger set of items (e.g. a dictionary) on which you're not even explicitly calling repr(), it becomes increasingly difficult to see why on earth one is getting encoding errors.

Is there something wrong with my codecs.getwriter replacement or is it
wrong that django returns a non-ascii (utf-8) bytestring for repr()?

Regards,
Walter Doekes
OSSO B.V.

Attachments (0)

Change History (2)

comment:1 Changed 4 years ago by lukeplant

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

I cannot find anywhere that says it is incorrect to return non-ascii from __repr__, and this is not the place to discuss any problems with your stdout recoder. As there is nothing in this bug report that is specific to Django, so I'm going to have to close as INVALID unless you can show that Django is doing something wrong. If there are some management commands that are printing unicode to sys.stdout that probably needs to be fixed - please open another bug.

Thanks!

comment:2 Changed 4 years ago by wdoekes

I suppose you're right. Thanks :)

For those experiencing the same issue, switch to this:

        # Replace stdout with a recoder that uses UTF-8 (like all of django uses)
        if sys.stdout.name == '<stdout>': # only mess with the original
            from encodings.utf_8 import StreamWriter
            class LaxStreamWriter(StreamWriter):
                def encode(file, string, errors):
                    if isinstance(string, str):
                        return (string, 1)
                    return StreamWriter.encode(string, errors)
            sys.stdout = LaxStreamWriter(os.fdopen(sys.stdout.fileno(), 'w', 0))

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.