Opened 9 years ago
Closed 9 years ago
#26731 closed Bug (wontfix)
UnicodeDecodeError when writing unicode to stdout of management command
| Reported by: | Darren Hobbs | Owned by: | nobody | 
|---|---|---|---|
| Component: | Core (Management commands) | Version: | 1.8 | 
| Severity: | Normal | Keywords: | py2 | 
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | yes | 
| Easy pickings: | no | UI/UX: | no | 
Description (last modified by )
In a management command in Python 2.7, if you include unicode characters when writing to stdout (with self.stdout.write) you will get a UnicodeDecodeError
# coding=utf-8
from __future__ import absolute_import, unicode_literals
import sys
import pytest
from django.core.management.base import OutputWrapper
from django.utils.encoding import smart_bytes
def test_bad_unicode_names():
    bad_name = smart_bytes(u'£')
    ow = OutputWrapper(sys.stdout)
    with pytest.raises(UnicodeDecodeError):
        ow.write(bad_name)
      Change History (17)
comment:1 by , 9 years ago
| Description: | modified (diff) | 
|---|
comment:2 by , 9 years ago
comment:3 by , 9 years ago
The string came from the db. The actual error came from django/core/management/base.py", line 111, in write.
I fixed my specific issue by importing unicode literals and using self.stdout.write('{}'.format(possibly_unicode_string_from_db)). I'm afraid my understanding of python's unicode string handling isn't great. Perhaps the answer is to update the documentation to suggest using unicode literals in management commands - the alternative is a nasty surprise waiting to happen in production (as it did to me!)
comment:4 by , 9 years ago
So the broken code is self.stdout.write('{}'.format(possibly_unicode_string_from_db)) without unicode_literals?
comment:5 by , 9 years ago
Apart from the content of a BinaryField, I don't see how any non-ASCII bytestring can come from the database.
comment:6 by , 9 years ago
The issue is that the non-ASCII Unicode string from the database is coerced into the bytestring '{}' (basically the same situation as #21933).
comment:7 by , 9 years ago
It's also compounded by the fact that sys.stdout.write copes with it but self.stdout.write doesn't.
comment:8 by , 9 years ago
It's because OutputWrapper's default ending is u'\n' so we end up comparing bytestring to Unicode in msg.endswith(ending). I'll leave it up to Claude or another Unicode expert about the correct resolution for this.
comment:9 by , 9 years ago
@dhobbs It's still a bit mysterious for us how you got the non-ASCII bytestring, that *might* be the bug in the first place. Could you develop a bit more about your use case?
comment:11 by , 9 years ago
>>> print('{}'.format(u'un café ?'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 6: ordinal not in range(128)
comment:12 by , 9 years ago
I'm using this management command:
# -*- coding: utf-8 -*-
from django.core.management.base import BaseCommand
from polls.models import Question
class Command(BaseCommand):
    def handle(self, *args, **options):
        v = 'Output: %s'.format(Question.objects.latest('id'))
        print(type(v))
        print(v)
        self.stdout.write(v)
with a question with some non-ASCII chars in the name.
comment:13 by , 9 years ago
| Component: | Uncategorized → Core (Management commands) | 
|---|---|
| Triage Stage: | Unreviewed → Accepted | 
| Type: | Uncategorized → Bug | 
Wow, I realize now that format or % (mod) are calling the __str__ of the model. Please, Python 3, come soon!
comment:16 by , 9 years ago
| Keywords: | py2 added | 
|---|
If someone is interested in the fix that Claude proposed, they'll need to debug the Windows test failures and propose an updated patch.
comment:17 by , 9 years ago
| Resolution: | → wontfix | 
|---|---|
| Status: | new → closed | 
Closing due to the end of Python 2 support in master in a couple weeks.
How do you end up with a situation where you cast a unicode string with non-ASCII characters to bytes?