#13831 closed (invalid)
UTF-8 in models.__repr__ causes hard to track down unicode errors.
Reported by: | Walter Doekes | Owned by: | nobody |
---|---|---|---|
Component: | Uncategorized | Version: | 1.2 |
Severity: | Keywords: | ||
Cc: | Walter Doekes | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
UTF-8 in models.repr causes hard to track down unicode errors.
Normally you require an explicit conversion to UTF-8 if you pipe the
output of a python command to a different program.
$ echo "print u'\\u20ac'" | ./manage.py shell € $ echo "print u'\\u20ac'" | PYTHONPATH=/opt/django12 ./manage.py shell | cat Traceback (most recent call last): File "<ipython console>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
This is expected.
So, to "fix" that, I include a recoder on stdout. I do this for every
call to a django.core.management.base.BaseCommand:
# Replace stdout with a recoder that uses the default locale lang, encoding = locale.getdefaultlocale() if encoding: if sys.stdout.name == '<stdout>': # only mess with the original sys.stdout = codecs.getwriter(encoding)( # Reopen stdout in unbuffered mode os.fdopen(sys.stdout.fileno(), 'w', 0), 'replace' )
That works fine too. But now things start to break when using repr:
$ cat models.py from django.db import models class A(models.Model): def __unicode__(self): return u'\u20ac' # EUR
>>> import sys, codecs >>> sys.stdout = codecs.getwriter('utf-8')(sys.stdout, 'replace') >>> from myproject.models import A >>> a = A() >>> repr(a) '<A: \xe2\x82\xac>' >>> print repr(a) ------------------------------------------------------------ Traceback (most recent call last): File "<ipython console>", line 1, in <module> File "/usr/lib/python2.6/codecs.py", line 351, in write data, consumed = self.encode(object, self.errors) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)
If this A instance is part of a larger set of items (e.g. a dictionary) on which you're not even explicitly calling repr(), it becomes increasingly difficult to see why on earth one is getting encoding errors.
Is there something wrong with my codecs.getwriter replacement or is it
wrong that django returns a non-ascii (utf-8) bytestring for repr()?
Regards,
Walter Doekes
OSSO B.V.
Change History (2)
comment:1 by , 14 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 14 years ago
I suppose you're right. Thanks :)
For those experiencing the same issue, switch to this:
# Replace stdout with a recoder that uses UTF-8 (like all of django uses) if sys.stdout.name == '<stdout>': # only mess with the original from encodings.utf_8 import StreamWriter class LaxStreamWriter(StreamWriter): def encode(file, string, errors): if isinstance(string, str): return (string, 1) return StreamWriter.encode(string, errors) sys.stdout = LaxStreamWriter(os.fdopen(sys.stdout.fileno(), 'w', 0))
I cannot find anywhere that says it is incorrect to return non-ascii from
__repr__
, and this is not the place to discuss any problems with your stdout recoder. As there is nothing in this bug report that is specific to Django, so I'm going to have to close as INVALID unless you can show that Django is doing something wrong. If there are some management commands that are printing unicode to sys.stdout that probably needs to be fixed - please open another bug.Thanks!