Opened 12 years ago

Closed 12 years ago

#17566 closed Uncategorized (fixed)

RegexURLResolver.__repr__(self) is not unicode safe.

Reported by: otto Owned by:
Component: Core (URLs) Version: 1.4-alpha-1
Severity: Normal Keywords: unicode
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is a sister to the (fixed) tickets #15795, #16158.

RegexURLResolver.repr(self) can throw a UnicodeDecodeError. The problem comes from the parameter self.urlconf_name. The exception is "UnicodeDecodeError: 'ascii' codec can't decode byte... (etc)."

A simple (and possibly naive) solution is to replace the occurrence of self.urlconf_name with str(self.urlconf_name).decode('utf-8') in repr(self). This seems to fix it for now, but someone who is working on this module might want to take a closer look.

Attachments (3)

app_name_unicode_test.tar.gz (9.3 KB ) - added by vitalije 12 years ago.
test django project that demonstrates this bug
17566-1.diff (663 bytes ) - added by Claude Paroz 12 years ago.
Apply force_unicode to self.urlconf_name
17566-2.diff (3.1 KB ) - added by Claude Paroz 12 years ago.
Patch for master with tests

Download all attachments as: .zip

Change History (15)

comment:1 by Aymeric Augustin, 12 years ago

Resolution: needsinfo
Status: newclosed

How did you encounter this error?

urlconf_name is the import path of the URLconf module, and Python doesn't support unicode module names (at least until Python 3.2), so it should always be an ASCII string.

in reply to:  1 comment:2 by vitalije, 12 years ago

Keywords: unicode added
Resolution: needsinfo
Status: closedreopened

I have noticed the same bug but I believe it is consequence of having app_name that has non ascii characters in it. May be that it is not supposed to have, but then, I couldn't find other way to display localised app_name in django's admin page. It is triggered only when I try to open missing link during the generation of technical_404_response. Here is a log:

Django version 1.4, using settings 'eprod.settings'
Development server is running at http://192.168.1.195:8100/
Quit the server with CONTROL-C.
Traceback (most recent call last):
  File "/usr/lib/python2.7/wsgiref/handlers.py", line 85, in run
    self.result = application(self.environ, self.start_response)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/staticfiles/handlers.py", line 67, in __call__
    return self.application(environ, start_response)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/wsgi.py", line 241, in __call__
    response = self.get_response(request)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 146, in get_response
    response = debug.technical_404_response(request, e)
  File "/usr/local/lib/python2.7/dist-packages/django/views/debug.py", line 432, in technical_404_response
    'reason': smart_str(exception, errors='replace'),
  File "/usr/local/lib/python2.7/dist-packages/django/utils/encoding.py", line 116, in smart_str
    return str(s)
  File "/usr/local/lib/python2.7/dist-packages/django/core/urlresolvers.py", line 235, in __repr__
    return smart_str(u'<%s %s (%s:%s) %s>' % (self.__class__.__name__, self.urlconf_name, self.app_name, self.namespace, self.regex.pattern))
  File "/usr/local/lib/python2.7/dist-packages/django/core/urlresolvers.py", line 235, in __repr__
    return smart_str(u'<%s %s (%s:%s) %s>' % (self.__class__.__name__, self.urlconf_name, self.app_name, self.namespace, self.regex.pattern))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 18: ordinal not in range(128) 

I hope this would help to solve the problem. In case that app_name has to be ascii then what is the proper way to force django admin application to show non-ascii applications names? In fact I would prefer to have plain ascii urls but display name or verbose_name or similar displayed in admin pages.
Vitalije

comment:3 by Claude Paroz, 12 years ago

Resolution: invalid
Status: reopenedclosed

If app_name contains non-ascii characters, it has to be a Unicode string, which I suspect was not the case in your app.

by vitalije, 12 years ago

test django project that demonstrates this bug

in reply to:  3 ; comment:4 by vitalije, 12 years ago

Resolution: invalid
Status: closedreopened

Replying to claudep:

If app_name contains non-ascii characters, it has to be a Unicode string, which I suspect was not the case in your app.

Well you suspect wrong. It is indeed unicode here is one of my models:

class Dobavljac(models.Model):
    class Meta:
        app_label = u'Продавница'
        db_table = u'prodavnica_dobavljac'
        verbose_name = u'Добављач'
        verbose_name_plural = u'Добављачи'
    ...

As matter of fact I just have started a new django project and a new application in it using django-admin.py script. Then I have added two very simple models with a unicode app_name like the one above, created admin.py and registered those two models and tested it produces same error. I attached code if you like to test it yourself. Try to open some url that is not in urlpatterns like "/admin/Продавница/rwerew" and django will try to produce technical_404_response but will throw an exception like mentioned before.
HTH Vitalije

in reply to:  4 ; comment:5 by Preston Holmes, 12 years ago

Replying to vitalije:

app_label in the Meta options is only used to connect the model to the right app instance. In this case, the only valid value is "test_app_name" - and here it is not required because the models are located in a models.py file at the first level of the test_app_name package.

This is largely unrelated to the original ticket - but the app_name is derived from the import path of the app - assigning a different Meta.app_label does not actually change the app_name, but in this case the resolver isn't even getting that far.

in reply to:  5 comment:6 by vitalije, 12 years ago

Replying to ptone:

This is largely unrelated to the original ticket - but the app_name is derived from the import path of the app - assigning a different Meta.app_label does not actually change the app_name, but in this case the resolver isn't even getting that far.

No I dont think it is unrelated to the original ticket. But I also must admit that I was wrong to believe that app_name is problem. After thorough investigation I have found that app_name is plain ascii = 'admin'. But I have found that self.regex.pattern is unicode. When I replaced definition of method repr

def __repr__(self):
    return return smart_str(u'<%s %s (%s:%s) %s>' %(self.__class__.__name__, self.urlconf_name, self.app_name, self.namespace, self.regex.pattern))

with the following definition

def __repr__(self):
    return '<%s %s (%s:%s) %s>' %tuple(map(smart_str, (self.__class__.__name__, self.urlconf_name, self.app_name, self.namespace, self.regex.pattern)))

the problem was solved.

PS: it is repr method of the django.core.urlresolvers.RegexURLResolver class.

Last edited 12 years ago by vitalije (previous) (diff)

comment:7 by Claude Paroz, 12 years ago

If you have an URL pattern with Unicode characters, I think it should also be prefixed with 'u'. I still think that current Django code is correct.

in reply to:  7 ; comment:8 by vitalije, 12 years ago

Replying to claudep:

If you have an URL pattern with Unicode characters, I think it should also be prefixed with 'u'. I still think that current Django code is correct.

Have you tested given django project or look at the attached code, or you are just guessing that I forgot to put u before string literal?

I didn't add any unicode url pattern, but admin application did because it uses app_name as part of its url patterns. So the problem is generated inside django and IMHO should be fixed there. Besides I have placed print self.regex.pattern, type(self.regex.pattern) right before the original return statement and in console log it was written as type unicode. So it actually has proper leading "u" ie it IS really unicode, but other arguments are strings. I don't know why python has problem with coercing all those strings in unicode, but I have verified that in my test application the only source of reported error is the above method repr of the RegexURLResolver class, and inside it, the only problem is regex.pattern being a unicode.

I did change that method locally in my own installation of django and it works for me. If it is not important for other users / or developers to make that correction part of their own installations / repository that's fine with me. If it is, then here are my 2 cents contribution.

Vitalije

PS: I feel like my reporting a bug is not very welcomed. I am grateful for all the time and energy that django developers have put in it, and I want to give something back to the community. It took me some time and effort to register, to write report in English (that is not my primary language), to debug code and report back the solution I found, and I don't complain about that effort and time. But I would appreciate (and I do believe some other programmers like me would appreciate also) if our bug reports are investigated more thoroughly before being replied. The next time I (or someone like me) find a bug in django, it can happen very easily that we wouldn't care so much to report it. I am not being impatient while waiting response from developers, so let them take their time as much as they need. But when they decide to respond, please let them take our reports seriously. Let them read our reports, test them, test their own hypothesis before making a reply. It would make us more willing to report. If there is something wrong with my report then please educate me how to make better report and by doing it you would make me feel that I owe you more good reports. That way there will be soon great number of good reports for every single bug, we users find. It is very easy to turn people away (not me, I believe), but not so easy to inspire them to get involved. Having said that I do hope that we will be more and more understanding each other in the future.

in reply to:  8 comment:9 by Claude Paroz, 12 years ago

Owner: nobody removed
Status: reopenednew

Replying to vitalije:

PS: I feel like my reporting a bug is not very welcomed. (...)

I'm very sorry if I gave you the impression that your report was not welcome. It's not the case. In written communication, it's so easy to overlook the tone of responses and I probably did not address your issue with enough care. I will continue to investigate, but if any other reader feels like working on the issue, I have no monopol on it :-)

comment:10 by Claude Paroz, 12 years ago

Needs tests: set
Triage Stage: UnreviewedAccepted

I could (at last) reproduced the issue. The culprit is self.urlconf_name which __repr__ already returns an utf-8-encode string (when self.urlconf_name is a list of RegexURLPattern), which then cannot be re-inserted in an unicode string without being decoded first. It is even reproducible in master.

Could you please test the following patch?

by Claude Paroz, 12 years ago

Attachment: 17566-1.diff added

Apply force_unicode to self.urlconf_name

by Claude Paroz, 12 years ago

Attachment: 17566-2.diff added

Patch for master with tests

comment:11 by Claude Paroz, 12 years ago

Needs tests: unset

comment:12 by Claude Paroz, 12 years ago

Resolution: fixed
Status: newclosed

The fix committed for #17892 [28fd876bae15df747d164dcae229840d2cf135ca] should have fixed this issue also.

Note: See TracTickets for help on using tickets.
Back to Top