Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#28067 closed Cleanup/optimization (fixed)

Clarify __str__() return type when using python_2_unicode_compatible()

Reported by: Christophe Pettus Owned by: nobody
Component: Documentation Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The str method (as implemented using six) strips off the "unicodeness" of a string. If chained together, this can result in an error:

from django.db import models
from django.utils.encoding import python_2_unicode_compatible

@python_2_unicode_compatible
class A(models.Model):
    c = models.CharField(max_length=20)

    def __str__(self):
        return self.c


@python_2_unicode_compatible
class B(models.Model):
    a = models.ForeignKey(A)

    def __str__(self):
        return str(self.a)
>>> from test.models import A, B
>>> a = A(c=u'réparer')
>>> a.save()
>>> b = B(a=a)
>>> b.save()
>>> a
<A: réparer>
>>> b
<B: [Bad Unicode data]>
>>> print b
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Users/xof/Documents/Dev/environments/peep/lib/python2.7/site-packages/django/utils/six.py", line 842, in <lambda>
    klass.__str__ = lambda self: self.__unicode__().encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

Note that the same behavior exists without @python_2_unicode_compatible, but the example uses so as to follow the correct path.

This is probably a bug in six (shouldn't str preserve its unicode type in this case?), but we might want to ameliorate it.

Change History (9)

comment:1 Changed 3 years ago by Tim Graham

Resolution: invalid
Status: newclosed

When using @python_2_unicode_compatible, __str__() must return "text" (unicode). Using str() returns bytes on Python 2. Use six.text_type(self.a) instead.

On Python 3, python_2_unicode_compatible does nothing. On Python 2, it aliases the __str__ method to __unicode__ and creates a new __str__ method that returns the result of __unicode__() encoded with UTF-8.

comment:2 Changed 3 years ago by Christophe Pettus

Then I'm confused in this case how to write code that runs on both Python 2 and Python 3 where you are retrieving text from the database. It comes back as type unicode (on Python 2); how should this be handled in a both 2 and 3 way?

comment:3 Changed 3 years ago by Simon Charette

As documnted you must return text and not bytes from __str__() when using @python_2_unicode_compatible.

from django.utils import six

@python_2_unicode_compatible
class B(models.Model):
    a = models.ForeignKey(A)

    def __str__(self):
        return six.text_type(self.a)

On Python 2 six.text_type is unicode while six.text_type is str on Python 3.

comment:4 Changed 3 years ago by Christophe Pettus

Then I think we have a documentation bug, because the docs don't actually say that; they just say: "To support Python 2 and 3 with a single code base, define a __str__ method returning text and apply this decorator to the class," and don't discuss six.text_type() at all (at least, not in the linked-to section).

Last edited 3 years ago by Tim Graham (previous) (diff)

comment:5 Changed 3 years ago by Simon Charette

Component: Database layer (models, ORM)Documentation
Resolution: invalid
Status: closednew
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization
Version: 1.11master

I think the documentation assumes the reader knows that returning text means returning unicode on Python 2 and str on Python 3. Wouldn't hurt to adjust the example to disambiguate that as basing __str__ methods output on related objects __str__ output is commonly used pattern.

comment:6 Changed 3 years ago by Christophe Pettus

It seems the most applicable advice is, "If you are using @python_2_unicode_compatible to support both Python 2 and Python 3, cast objects using six.text_type rather than str or unicode." Does that cover it?

comment:7 Changed 3 years ago by Tim Graham

Has patch: set
Summary: Encoding error when __str__ returns non-ASCIIClarify __str__() return type when using python_2_unicode_compatible()

comment:8 Changed 3 years ago by GitHub <noreply@…>

Resolution: fixed
Status: newclosed

In 83cbb8d0:

Fixed #28067 -- Clarified str() return type when using python_2_unicode_compatible().

comment:9 Changed 3 years ago by Tim Graham <timograham@…>

In 9a93c1a3:

[1.11.x] Fixed #28067 -- Clarified str() return type when using python_2_unicode_compatible().

Backport of 83cbb8d080299669de3569941a40789e5d32b009 from master

Note: See TracTickets for help on using tickets.
Back to Top