#18063 closed Bug (wontfix)
repr() should return only ascii, not unicode
Reported by: | Thomas Güttler | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | hv@…, real.human@… | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
repr() should only contain ascii, not unicode. You get strange errors like "unprintable Exception" if there are non ascii chars in the repr() of a model (e.g. raise Exception(u'Failed obj-repr=%r' % obj)).
The attached patch passes all unittests on up to date django 1.4 SVN.
Attachments (2)
Change History (19)
by , 13 years ago
Attachment: | no-unicode-in-repr.diff added |
---|
comment:1 by , 13 years ago
comment:2 by , 13 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
comment:3 by , 13 years ago
Summary: | repr() should count only ascii, not unicode → repr() should return only ascii, not unicode |
---|
comment:5 by , 12 years ago
Version: | 1.4 → master |
---|
by , 12 years ago
Attachment: | 18063-no-unicode-repr.diff added |
---|
comment:7 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:8 by , 12 years ago
Resolution: | fixed |
---|---|
Severity: | Normal → Release blocker |
Status: | closed → reopened |
Triage Stage: | Accepted → Unreviewed |
I don't believe this fix is correct. More broadly, the problem description is not correct. Nowhere in the referenced Python documentation (http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr__) does it say the (byte)string returned by __repr__
must contain only ASCII characters. Django was not returning unicode from __repr__
, it was returning a utf-8 encoded bytestring. That's perfectly legal Python. The fact that some other bits of Python tools are unhelpful and deal with non-ascii data by throwing up "unprintable exception object" when an exception is raised involving a model instance with non-ASCII data in its repr indicates a bug somewhere else. That bug should be fixed at its source, not by removing non-ASCII characters from model instances' repr.
The referenced doc also states "This is typically used for debugging, so it is important that the representation is information-rich and unambiguous." This change has moved in the wrong direction on that score. Consider before the change:
--> ./manage.py shell Python 2.7.2+ (default, Oct 4 2011, 20:03:08) [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> import django >>> django.get_version() u'1.5.dev20120819181059' >>> from ctrac.models import Cat >>> Cat.objects.filter(adopted_name__startswith='Am') [<Cat: Skittle (now Amélie)>] >>> quit()
After the change:
--> ./manage.py shell Python 2.7.2+ (default, Oct 4 2011, 20:03:08) [GCC 4.6.1] on linux2 Type "help", "copyright", "credits" or "license" for more information. (InteractiveConsole) >>> import django >>> django.get_version() u'1.5.dev20120821213816' >>> from ctrac.models import Cat >>> Cat.objects.filter(adopted_name__startswith='Am') [<Cat: Skittle (now Am?lie)>] >>>
That second output would have me concerned the my data been corrupted, when in fact it has not. It is just __repr__
that is now corrupting it on output.
I believe this change should be reverted and the ticket closed either wontfix or needsinfo. needsinfo would be to investigate under what conditions the real problem (unprintable exception objects) occurs and to see if there is anything that Django is doing to cause it (though I rather suspect is a base Python problem).
Marking release blocker because this has introduced a regression in functionality from previous release.
comment:9 by , 12 years ago
Hmm, I think you are correct here Karen, thanks for investigating this further.
comment:10 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
comment:11 by , 12 years ago
Resolution: | fixed |
---|---|
Severity: | Release blocker → Normal |
Status: | closed → reopened |
Auto-"fixed" by git commit; reopening to mark as wontfix/needsinfo
comment:12 by , 12 years ago
Resolution: | → needsinfo |
---|---|
Status: | reopened → closed |
See Karen's notes above.
comment:13 by , 12 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → reopened |
The Python documentation says about repr:
... so it is important that the representation is information-rich and unambiguous
It should be unambiguous. For me a utf8 byte string is ambiguous. It could by a pure binary string, or it could be an other encoding like latin1. You get strange UnicodeErrors if you pass around utf8 byte strings in Python 2.x. It is hard to get to the root of the problem, especially if you are new to python.
I understand Karen that she is worried about [<Cat: Skittle (now Am?lie)>] looking strange and broken. But this output is much better than a unicode exception without any usable output.
comment:14 by , 12 years ago
I read the documentation again: http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr
I admit that this is not a good solution:
[<Cat: Skittle (now Am?lie)>]
What do you think about this solution? I can be used to recreate the object like suggested in the above link.
[<Cat: Skittle (now Am\xc3\xa9lie)>]
comment:15 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
Django code that generates an exception when handed a bytestring which contains non-ASCII characters is broken. That is the code that should be fixed. I strongly believe changing Model __repr__
to avoid putting non-ASCII characters in it is the wrong approach. It is sidestepping a problem rather than fixing the source (or sources) of the problem.
Bytestrings are ambiguous, yes. You need to rely on some other information to know how to properly decode bytestrings. However, in Python 2 __repr__
must return a bytestring, therefore Django must encode to something. The overall approach taken by Django, consistently, when unicode support was added, was to "assume utf-8" wherever a bytestring has to be decoded/encoded and the "correct" encoding is unknown. Returning a utf-8 encoded bytestring from __repr__
is consistent with this overall approach. I don't believe it should be changed.
I'm closing this wontfix. If you can lay out a case (or cases) where non-ASCII data in __repr__
leads to exceptions caused by Django code, then we should fix those issues. But I don't believe the fixes will require changing the behavior of __repr__
, and the original description and discussion so far here has focused solely on __repr__
, which is the wrong place to look. Therefore please open a new ticket (or tickets) to address issues found where Django code cannot correctly handle non-ASCII data in __repr__
.
comment:16 by , 12 years ago
Completely agree with Karen.
However, now that we tend to generalize unicode literals, we should take care not to use repr in other constructed strings (as in #17566), unless we can easily obtain UnicodeDecodeError. Python 3, here we come!
comment:17 by , 7 years ago
Just for the records, there is a corresponding question: https://stackoverflow.com/questions/46726926/unicodedecodeerror-using-django-and-format-strings
Reference to the Python Doc: http://docs.python.org/reference/datamodel.html?highlight=repr#object.__repr
It must not be a unicode object in Python 2.x