Opened 13 years ago
Closed 13 years ago
#19098 closed Bug (fixed)
UnicodeDecodeError when including URLs in Windows with non-ASCII paths
| Reported by: | artamoshin | Owned by: | nobody |
|---|---|---|---|
| Component: | Core (URLs) | Version: | 1.4 |
| Severity: | Normal | Keywords: | non-ascii unicode windows path UnicodeDecodeError url include |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Including app URLs raises UnicodeDecodeError when project path contains non-ASCII characters (i.e. c:\Users\Александр\Documents\Projects')
Traceback (most recent call last):
File "C:\Python27\lib\wsgiref\handlers.py", line 85, in run
self.result = application(self.environ, self.start_response)
File "C:\Python27\lib\site-packages\django\contrib\staticfiles\handlers.py", line 67, in __call__
return self.application(environ, start_response)
File "C:\Python27\lib\site-packages\django\core\handlers\wsgi.py", line 241, in __call__
response = self.get_response(request)
File "C:\Python27\lib\site-packages\django\core\handlers\base.py", line 146, in get_response
response = debug.technical_404_response(request, e)
File "C:\Python27\lib\site-packages\django\views\debug.py", line 443, in technical_404_response
'reason': smart_str(exception, errors='replace'),
File "C:\Python27\lib\site-packages\django\utils\encoding.py", line 116, in smart_str
return str(s)
File "C:\Python27\lib\site-packages\django\core\urlresolvers.py", line 235, in __repr__
return smart_str(u'<%s %s (%s:%s) %s>' % (self.__class__.__name__, self.urlconf_name, self.app_name, self.namespace, self.regex.pattern))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 37: ordinal
not in range(128)
because string module representation is bytesting that contains non-ASCII path, and it fails to format with u'%s' % self.urlconf_name.
My solution: convert string representation with sys.getfilesystemencoding() to Unicode.
Attachments (1)
Change History (9)
by , 13 years ago
| Attachment: | regex-url-resolver_repr_unicode.diff added |
|---|
follow-up: 2 comment:1 by , 13 years ago
comment:2 by , 13 years ago
Replying to claudep:
This looks very similar to #17566, which has been fixed when I committed the fix for #17892. Would it be possible for you to test on master?
Unfortunately on master (commit c99ad64) UnicodeDecodeError raises too because repr(self.urlconf_name) all the same contains non-ASCII and fails formatting with unicode '<%s %s (%s:%s) %s>'.
I think there are 3 ways:
- decode urlconf_name with sys.getfilesystemencoding()
- use bytestring format string:
b'<%s %s (%s:%s) %s>' - drop filename and use only module name:
self.urlconf_name.__name__
follow-up: 4 comment:3 by , 13 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
Still in master, I see another way: do not call repr() on self.urlconf_name (which should be a proper unicode string). Can you test that?
comment:4 by , 13 years ago
Replying to claudep:
Still in master, I see another way: do not call
repr()onself.urlconf_name(which should be a proper unicode string). Can you test that?
Formatting u'%s' % self.urlconf_name anyway call repr() implicitly. Python (at least 2.7) repr(module) always return bytestring (not Unicode) like <module 'modulename' from 'non/ascii/path/modulename.py'>, which may contains codes >127. Next, implicit decoding while formatting uses 'ascii' codec, that raises UnicodeDecodeError because it doesn't know what to do with that codes.
follow-up: 6 comment:5 by , 13 years ago
What is exactly the value of self.urlconf_name at the start of the __repr__ method? If it is a proper unicode string, then it should not be a problem to include it in the format string (unicode both sides). Currently the repr(self.urlconf_name) is producing some encoded chars which then produce the UnicodeDecodeError. Sorry if I miss the point, I try to understand the issue as I cannot reproduce it locally.
comment:6 by , 13 years ago
Replying to claudep:
What is exactly the value of
self.urlconf_nameat the start of the__repr__method? If it is a proper unicode string, then it should not be a problem to include it in the format string (unicode both sides). Currently therepr(self.urlconf_name)is producing some encoded chars which then produce theUnicodeDecodeError. Sorry if I miss the point, I try to understand the issue as I cannot reproduce it locally.
No, __repr__ returns NOT Unicode!
print self.urlconf_name # <module 'testproject.included_urls' from 'C:\Тест\testproject\included_urls.pyc'> # repr() function returns not ASCII-safe binary string: print repr(self.urlconf_name) # <module 'testproject.included_urls' from 'C:\Тест\testproject\included_urls.pyc'> print type(repr(self.urlconf_name)) # <type 'str'> print self.urlconf_name.__repr__() # <module 'testproject.included_urls' from 'C:\Тест\testproject\included_urls.pyc'> print type(self.urlconf_name.__repr__()) # <type 'str'> print self.urlconf_name.__file__ # C:\Тест\testproject\included_urls.pyc print type(self.urlconf_name.__file__) # <type 'str'> # ASCII-safe: print repr(repr(self.urlconf_name)) # "<module 'testproject.included_urls' from 'C:\\\xd2\xe5\xf1\xf2\\testproject\\included_urls.pyc'>" # Formatting: b'%s' % repr(self.urlconf_name) # OK u'%s' % repr(self.urlconf_name).decode('mbcs') # OK u'%s' % repr(self.urlconf_name) # UnicodeEncodeError
You may reproduce this by renaming project path using non-latin characters, so self.urlconf_name.__file__ (binary string) will contain non-ASCII.
comment:7 by , 13 years ago
Sorry, I was wrongly assuming that self.urlconf_name was unicode, which is not.
Then, when I test with a non-ascii character in project path, Django breaks at several places. I do not say that we should not try to fix it, but currently it is probably not safe to do so...
comment:8 by , 13 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
We recently made progress in how we handle non-ascii paths (#19357). I've just tested technical_404_response and it ran fine. Reopen if you can reproduce on recent code.
This looks very similar to #17566, which has been fixed when I committed the fix for #17892. Would it be possible for you to test on master?