Opened 8 years ago
Closed 8 years ago
#26731 closed Bug (wontfix)
UnicodeDecodeError when writing unicode to stdout of management command
Reported by: | Darren Hobbs | Owned by: | nobody |
---|---|---|---|
Component: | Core (Management commands) | Version: | 1.8 |
Severity: | Normal | Keywords: | py2 |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
In a management command in Python 2.7, if you include unicode characters when writing to stdout (with self.stdout.write) you will get a UnicodeDecodeError
# coding=utf-8 from __future__ import absolute_import, unicode_literals import sys import pytest from django.core.management.base import OutputWrapper from django.utils.encoding import smart_bytes def test_bad_unicode_names(): bad_name = smart_bytes(u'£') ow = OutputWrapper(sys.stdout) with pytest.raises(UnicodeDecodeError): ow.write(bad_name)
Change History (17)
comment:1 by , 8 years ago
Description: | modified (diff) |
---|
comment:2 by , 8 years ago
comment:3 by , 8 years ago
The string came from the db. The actual error came from django/core/management/base.py", line 111, in write.
I fixed my specific issue by importing unicode literals and using self.stdout.write('{}'.format(unicode_string))
. I'm afraid my understanding of python's unicode string handling isn't great. Perhaps the answer is to update the documentation to suggest using unicode literals in management commands - the alternative is a nasty surprise waiting to happen in production (as it did to me!)
comment:4 by , 8 years ago
So the broken code is self.stdout.write('{}'.format(possibly_unicode_string_from_db))
without unicode_literals
?
comment:5 by , 8 years ago
Apart from the content of a BinaryField
, I don't see how any non-ASCII bytestring can come from the database.
comment:6 by , 8 years ago
The issue is that the non-ASCII Unicode string from the database is coerced into the bytestring '{}'
(basically the same situation as #21933).
comment:7 by , 8 years ago
It's also compounded by the fact that sys.stdout.write copes with it but self.stdout.write doesn't.
comment:8 by , 8 years ago
It's because OutputWrapper
's default ending
is u'\n'
so we end up comparing bytestring to Unicode in msg.endswith(ending)
. I'll leave it up to Claude or another Unicode expert about the correct resolution for this.
comment:9 by , 8 years ago
@dhobbs It's still a bit mysterious for us how you got the non-ASCII bytestring, that *might* be the bug in the first place. Could you develop a bit more about your use case?
comment:11 by , 8 years ago
>>> print('{}'.format(u'un café ?')) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 6: ordinal not in range(128)
comment:12 by , 8 years ago
I'm using this management command:
# -*- coding: utf-8 -*- from django.core.management.base import BaseCommand from polls.models import Question class Command(BaseCommand): def handle(self, *args, **options): v = 'Output: %s'.format(Question.objects.latest('id')) print(type(v)) print(v) self.stdout.write(v)
with a question with some non-ASCII chars in the name.
comment:13 by , 8 years ago
Component: | Uncategorized → Core (Management commands) |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
Wow, I realize now that format
or %
(mod) are calling the __str__
of the model. Please, Python 3, come soon!
comment:16 by , 8 years ago
Keywords: | py2 added |
---|
If someone is interested in the fix that Claude proposed, they'll need to debug the Windows test failures and propose an updated patch.
comment:17 by , 8 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Closing due to the end of Python 2 support in master in a couple weeks.
How do you end up with a situation where you cast a unicode string with non-ASCII characters to bytes?