Code

Opened 2 years ago

Closed 23 months ago

Last modified 23 months ago

#18063 closed Bug (wontfix)

repr() should return only ascii, not unicode

Reported by: guettli Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: hv@…, real.human@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

repr() should only contain ascii, not unicode. You get strange errors like "unprintable Exception" if there are non ascii chars in the repr() of a model (e.g. raise Exception(u'Failed obj-repr=%r' % obj)).

The attached patch passes all unittests on up to date django 1.4 SVN.

Attachments (2)

no-unicode-in-repr.diff (1.5 KB) - added by guettli 2 years ago.
18063-no-unicode-repr.diff (2.5 KB) - added by mrmachine 23 months ago.

Download all attachments as: .zip

Change History (18)

Changed 2 years ago by guettli

comment:1 Changed 2 years ago by guettli

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Reference to the Python Doc: http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr

The return value must be a string object

It must not be a unicode object in Python 2.x

comment:2 Changed 2 years ago by lukeplant

  • Component changed from Uncategorized to Database layer (models, ORM)
  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Uncategorized to Bug

comment:3 Changed 2 years ago by guettli

  • Summary changed from repr() should count only ascii, not unicode to repr() should return only ascii, not unicode

comment:4 Changed 23 months ago by mrmachine

  • Cc real.human@… added

Updated patch takes into account Python 3.x compatibility.

comment:5 Changed 23 months ago by mrmachine

  • Version changed from 1.4 to master

Changed 23 months ago by mrmachine

comment:6 Changed 23 months ago by mrmachine

Moved encode() out of try/except block.

comment:7 Changed 23 months ago by Simon Meers <simon@…>

  • Resolution set to fixed
  • Status changed from new to closed

In [3fce0d2a9162cf6e749a6de0b18890dea8955e89]:

Fixed #18063 -- Avoid unicode in Model.repr in python 2

Thanks guettli and mrmachine.

comment:8 Changed 23 months ago by kmtracey

  • Resolution fixed deleted
  • Severity changed from Normal to Release blocker
  • Status changed from closed to reopened
  • Triage Stage changed from Accepted to Unreviewed

I don't believe this fix is correct. More broadly, the problem description is not correct. Nowhere in the referenced Python documentation (http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr__) does it say the (byte)string returned by __repr__ must contain only ASCII characters. Django was not returning unicode from __repr__, it was returning a utf-8 encoded bytestring. That's perfectly legal Python. The fact that some other bits of Python tools are unhelpful and deal with non-ascii data by throwing up "unprintable exception object" when an exception is raised involving a model instance with non-ASCII data in its repr indicates a bug somewhere else. That bug should be fixed at its source, not by removing non-ASCII characters from model instances' repr.

The referenced doc also states "This is typically used for debugging, so it is important that the representation is information-rich and unambiguous." This change has moved in the wrong direction on that score. Consider before the change:

--> ./manage.py shell
Python 2.7.2+ (default, Oct  4 2011, 20:03:08) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120819181059'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Amélie)>]
>>> quit()

After the change:

--> ./manage.py shell
Python 2.7.2+ (default, Oct  4 2011, 20:03:08) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import django
>>> django.get_version()
u'1.5.dev20120821213816'
>>> from ctrac.models import Cat
>>> Cat.objects.filter(adopted_name__startswith='Am')
[<Cat: Skittle (now Am?lie)>]
>>> 

That second output would have me concerned the my data been corrupted, when in fact it has not. It is just __repr__ that is now corrupting it on output.

I believe this change should be reverted and the ticket closed either wontfix or needsinfo. needsinfo would be to investigate under what conditions the real problem (unprintable exception objects) occurs and to see if there is anything that Django is doing to cause it (though I rather suspect is a base Python problem).

Marking release blocker because this has introduced a regression in functionality from previous release.

comment:9 Changed 23 months ago by DrMeers

Hmm, I think you are correct here Karen, thanks for investigating this further.

comment:10 Changed 23 months ago by Simon Meers <simon@…>

  • Resolution set to fixed
  • Status changed from reopened to closed

In [dfe63a52effab2c8b5f72a6aceb8646f03d490bb]:

Revert "Fixed #18063 -- Avoid unicode in Model.repr in python 2"

This reverts commit 3fce0d2a9162cf6e749a6de0b18890dea8955e89.

comment:11 Changed 23 months ago by DrMeers

  • Resolution fixed deleted
  • Severity changed from Release blocker to Normal
  • Status changed from closed to reopened

Auto-"fixed" by git commit; reopening to mark as wontfix/needsinfo

comment:12 Changed 23 months ago by DrMeers

  • Resolution set to needsinfo
  • Status changed from reopened to closed

See Karen's notes above.

comment:13 Changed 23 months ago by guettli

  • Resolution needsinfo deleted
  • Status changed from closed to reopened

The Python documentation says about repr:

... so it is important that the representation is information-rich and unambiguous

It should be unambiguous. For me a utf8 byte string is ambiguous. It could by a pure binary string, or it could be an other encoding like latin1. You get strange UnicodeErrors if you pass around utf8 byte strings in Python 2.x. It is hard to get to the root of the problem, especially if you are new to python.

I understand Karen that she is worried about [<Cat: Skittle (now Am?lie)>] looking strange and broken. But this output is much better than a unicode exception without any usable output.

comment:14 Changed 23 months ago by guettli

I read the documentation again: http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr

I admit that this is not a good solution:

[<Cat: Skittle (now Am?lie)>]

What do you think about this solution? It can be used to recreate the object like suggested in the above link.

[<Cat: Skittle (now  Am\xc3\xa9lie)>]
Last edited 23 months ago by guettli (previous) (diff)

comment:15 Changed 23 months ago by kmtracey

  • Resolution set to wontfix
  • Status changed from reopened to closed

Django code that generates an exception when handed a bytestring which contains non-ASCII characters is broken. That is the code that should be fixed. I strongly believe changing Model __repr__ to avoid putting non-ASCII characters in it is the wrong approach. It is sidestepping a problem rather than fixing the source (or sources) of the problem.

Bytestrings are ambiguous, yes. You need to rely on some other information to know how to properly decode bytestrings. However, in Python 2 __repr__ must return a bytestring, therefore Django must encode to something. The overall approach taken by Django, consistently, when unicode support was added, was to "assume utf-8" wherever a bytestring has to be decoded/encoded and the "correct" encoding is unknown. Returning a utf-8 encoded bytestring from __repr__ is consistent with this overall approach. I don't believe it should be changed.

I'm closing this wontfix. If you can lay out a case (or cases) where non-ASCII data in __repr__ leads to exceptions caused by Django code, then we should fix those issues. But I don't believe the fixes will require changing the behavior of __repr__, and the original description and discussion so far here has focused solely on __repr__, which is the wrong place to look. Therefore please open a new ticket (or tickets) to address issues found where Django code cannot correctly handle non-ASCII data in __repr__.

comment:16 Changed 23 months ago by claudep

Completely agree with Karen.

However, now that we tend to generalize unicode literals, we should take care not to use repr in other constructed strings (as in #17566), unless we can easily obtain UnicodeDecodeError. Python 3, here we come!

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.