Opened 12 years ago

Closed 12 years ago

Last modified 6 years ago

#18063 closed Bug (wontfix)

repr() should return only ascii, not unicode

Reported by: Thomas Güttler Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: hv@…, real.human@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

repr() should only contain ascii, not unicode. You get strange errors like "unprintable Exception" if there are non ascii chars in the repr() of a model (e.g. raise Exception(u'Failed obj-repr=%r' % obj)).

The attached patch passes all unittests on up to date django 1.4 SVN.

Attachments (2)

no-unicode-in-repr.diff (1.5 KB ) - added by Thomas Güttler 12 years ago.
18063-no-unicode-repr.diff (2.5 KB ) - added by Tai Lee 12 years ago.

Download all attachments as: .zip

Change History (19)

by Thomas Güttler, 12 years ago

Attachment: no-unicode-in-repr.diff added

comment:1 by Thomas Güttler, 12 years ago

Reference to the Python Doc: http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr

The return value must be a string object

It must not be a unicode object in Python 2.x

comment:2 by Luke Plant, 12 years ago

Component: UncategorizedDatabase layer (models, ORM)
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

comment:3 by Thomas Güttler, 12 years ago

Summary: repr() should count only ascii, not unicoderepr() should return only ascii, not unicode

comment:4 by Tai Lee, 12 years ago

Cc: real.human@… added

Updated patch takes into account Python 3.x compatibility.

comment:5 by Tai Lee, 12 years ago

Version: 1.4master

by Tai Lee, 12 years ago

Attachment: 18063-no-unicode-repr.diff added

comment:6 by Tai Lee, 12 years ago

Moved encode() out of try/except block.

comment:7 by Simon Meers <simon@…>, 12 years ago

Resolution: fixed
Status: newclosed

In [3fce0d2a9162cf6e749a6de0b18890dea8955e89]:

Fixed #18063 -- Avoid unicode in Model.repr in python 2

Thanks guettli and mrmachine.

comment:8 by Karen Tracey, 12 years ago

Resolution: fixed
Severity: NormalRelease blocker
Status: closedreopened
Triage Stage: AcceptedUnreviewed

I don't believe this fix is correct. More broadly, the problem description is not correct. Nowhere in the referenced Python documentation (http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr__) does it say the (byte)string returned by __repr__ must contain only ASCII characters. Django was not returning unicode from __repr__, it was returning a utf-8 encoded bytestring. That's perfectly legal Python. The fact that some other bits of Python tools are unhelpful and deal with non-ascii data by throwing up "unprintable exception object" when an exception is raised involving a model instance with non-ASCII data in its repr indicates a bug somewhere else. That bug should be fixed at its source, not by removing non-ASCII characters from model instances' repr.

The referenced doc also states "This is typically used for debugging, so it is important that the representation is information-rich and unambiguous." This change has moved in the wrong direction on that score. Consider before the change:

--> ./manage.py shell
Python 2.7.2+ (default, Oct  4 2011, 20:03:08) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120819181059'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Amélie)>]
>>> quit()

After the change:

--> ./manage.py shell
Python 2.7.2+ (default, Oct  4 2011, 20:03:08) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120821213816'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Am?lie)>]
>>> 

That second output would have me concerned the my data been corrupted, when in fact it has not. It is just __repr__ that is now corrupting it on output.

I believe this change should be reverted and the ticket closed either wontfix or needsinfo. needsinfo would be to investigate under what conditions the real problem (unprintable exception objects) occurs and to see if there is anything that Django is doing to cause it (though I rather suspect is a base Python problem).

Marking release blocker because this has introduced a regression in functionality from previous release.

comment:9 by Simon Meers, 12 years ago

Hmm, I think you are correct here Karen, thanks for investigating this further.

comment:10 by Simon Meers <simon@…>, 12 years ago

Resolution: fixed
Status: reopenedclosed

In [dfe63a52effab2c8b5f72a6aceb8646f03d490bb]:

Revert "Fixed #18063 -- Avoid unicode in Model.repr in python 2"

This reverts commit 3fce0d2a9162cf6e749a6de0b18890dea8955e89.

comment:11 by Simon Meers, 12 years ago

Resolution: fixed
Severity: Release blockerNormal
Status: closedreopened

Auto-"fixed" by git commit; reopening to mark as wontfix/needsinfo

comment:12 by Simon Meers, 12 years ago

Resolution: needsinfo
Status: reopenedclosed

See Karen's notes above.

comment:13 by Thomas Güttler, 12 years ago

Resolution: needsinfo
Status: closedreopened

The Python documentation says about repr:

... so it is important that the representation is information-rich and unambiguous

It should be unambiguous. For me a utf8 byte string is ambiguous. It could by a pure binary string, or it could be an other encoding like latin1. You get strange UnicodeErrors if you pass around utf8 byte strings in Python 2.x. It is hard to get to the root of the problem, especially if you are new to python.

I understand Karen that she is worried about [<Cat: Skittle (now Am?lie)>] looking strange and broken. But this output is much better than a unicode exception without any usable output.

comment:14 by Thomas Güttler, 12 years ago

I read the documentation again: http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr

I admit that this is not a good solution:

[<Cat: Skittle (now Am?lie)>]

What do you think about this solution? It can be used to recreate the object like suggested in the above link.

[<Cat: Skittle (now  Am\xc3\xa9lie)>]
Last edited 12 years ago by Thomas Güttler (previous) (diff)

comment:15 by Karen Tracey, 12 years ago

Resolution: wontfix
Status: reopenedclosed

Django code that generates an exception when handed a bytestring which contains non-ASCII characters is broken. That is the code that should be fixed. I strongly believe changing Model __repr__ to avoid putting non-ASCII characters in it is the wrong approach. It is sidestepping a problem rather than fixing the source (or sources) of the problem.

Bytestrings are ambiguous, yes. You need to rely on some other information to know how to properly decode bytestrings. However, in Python 2 __repr__ must return a bytestring, therefore Django must encode to something. The overall approach taken by Django, consistently, when unicode support was added, was to "assume utf-8" wherever a bytestring has to be decoded/encoded and the "correct" encoding is unknown. Returning a utf-8 encoded bytestring from __repr__ is consistent with this overall approach. I don't believe it should be changed.

I'm closing this wontfix. If you can lay out a case (or cases) where non-ASCII data in __repr__ leads to exceptions caused by Django code, then we should fix those issues. But I don't believe the fixes will require changing the behavior of __repr__, and the original description and discussion so far here has focused solely on __repr__, which is the wrong place to look. Therefore please open a new ticket (or tickets) to address issues found where Django code cannot correctly handle non-ASCII data in __repr__.

comment:16 by Claude Paroz, 12 years ago

Completely agree with Karen.

However, now that we tend to generalize unicode literals, we should take care not to use repr in other constructed strings (as in #17566), unless we can easily obtain UnicodeDecodeError. Python 3, here we come!

comment:17 by Thomas Güttler, 6 years ago

Note: See TracTickets for help on using tickets.
Back to Top