Opened 12 years ago
Closed 12 years ago
#19098 closed Bug (fixed)
UnicodeDecodeError when including URLs in Windows with non-ASCII paths
Reported by: | artamoshin | Owned by: | nobody |
---|---|---|---|
Component: | Core (URLs) | Version: | 1.4 |
Severity: | Normal | Keywords: | non-ascii unicode windows path UnicodeDecodeError url include |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Including app URLs raises UnicodeDecodeError when project path contains non-ASCII characters (i.e. c:\Users\Александр\Documents\Projects')
Traceback (most recent call last): File "C:\Python27\lib\wsgiref\handlers.py", line 85, in run self.result = application(self.environ, self.start_response) File "C:\Python27\lib\site-packages\django\contrib\staticfiles\handlers.py", line 67, in __call__ return self.application(environ, start_response) File "C:\Python27\lib\site-packages\django\core\handlers\wsgi.py", line 241, in __call__ response = self.get_response(request) File "C:\Python27\lib\site-packages\django\core\handlers\base.py", line 146, in get_response response = debug.technical_404_response(request, e) File "C:\Python27\lib\site-packages\django\views\debug.py", line 443, in technical_404_response 'reason': smart_str(exception, errors='replace'), File "C:\Python27\lib\site-packages\django\utils\encoding.py", line 116, in smart_str return str(s) File "C:\Python27\lib\site-packages\django\core\urlresolvers.py", line 235, in __repr__ return smart_str(u'<%s %s (%s:%s) %s>' % (self.__class__.__name__, self.urlconf_name, self.app_name, self.namespace, self.regex.pattern)) UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 37: ordinal not in range(128)
because string module representation is bytesting that contains non-ASCII path, and it fails to format with u'%s' % self.urlconf_name.
My solution: convert string representation with sys.getfilesystemencoding() to Unicode.
Attachments (1)
Change History (9)
by , 12 years ago
Attachment: | regex-url-resolver_repr_unicode.diff added |
---|
follow-up: 2 comment:1 by , 12 years ago
comment:2 by , 12 years ago
Replying to claudep:
This looks very similar to #17566, which has been fixed when I committed the fix for #17892. Would it be possible for you to test on master?
Unfortunately on master (commit c99ad64) UnicodeDecodeError raises too because repr(self.urlconf_name) all the same contains non-ASCII and fails formatting with unicode '<%s %s (%s:%s) %s>'
.
I think there are 3 ways:
- decode urlconf_name with sys.getfilesystemencoding()
- use bytestring format string:
b'<%s %s (%s:%s) %s>'
- drop filename and use only module name:
self.urlconf_name.__name__
follow-up: 4 comment:3 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Still in master, I see another way: do not call repr()
on self.urlconf_name
(which should be a proper unicode string). Can you test that?
comment:4 by , 12 years ago
Replying to claudep:
Still in master, I see another way: do not call
repr()
onself.urlconf_name
(which should be a proper unicode string). Can you test that?
Formatting u'%s' % self.urlconf_name
anyway call repr()
implicitly. Python (at least 2.7) repr(module)
always return bytestring (not Unicode) like <module 'modulename' from 'non/ascii/path/modulename.py'>
, which may contains codes >127. Next, implicit decoding while formatting uses 'ascii' codec, that raises UnicodeDecodeError because it doesn't know what to do with that codes.
follow-up: 6 comment:5 by , 12 years ago
What is exactly the value of self.urlconf_name
at the start of the __repr__
method? If it is a proper unicode string, then it should not be a problem to include it in the format string (unicode both sides). Currently the repr(self.urlconf_name)
is producing some encoded chars which then produce the UnicodeDecodeError
. Sorry if I miss the point, I try to understand the issue as I cannot reproduce it locally.
comment:6 by , 12 years ago
Replying to claudep:
What is exactly the value of
self.urlconf_name
at the start of the__repr__
method? If it is a proper unicode string, then it should not be a problem to include it in the format string (unicode both sides). Currently therepr(self.urlconf_name)
is producing some encoded chars which then produce theUnicodeDecodeError
. Sorry if I miss the point, I try to understand the issue as I cannot reproduce it locally.
No, __repr__
returns NOT Unicode!
print self.urlconf_name # <module 'testproject.included_urls' from 'C:\Тест\testproject\included_urls.pyc'> # repr() function returns not ASCII-safe binary string: print repr(self.urlconf_name) # <module 'testproject.included_urls' from 'C:\Тест\testproject\included_urls.pyc'> print type(repr(self.urlconf_name)) # <type 'str'> print self.urlconf_name.__repr__() # <module 'testproject.included_urls' from 'C:\Тест\testproject\included_urls.pyc'> print type(self.urlconf_name.__repr__()) # <type 'str'> print self.urlconf_name.__file__ # C:\Тест\testproject\included_urls.pyc print type(self.urlconf_name.__file__) # <type 'str'> # ASCII-safe: print repr(repr(self.urlconf_name)) # "<module 'testproject.included_urls' from 'C:\\\xd2\xe5\xf1\xf2\\testproject\\included_urls.pyc'>" # Formatting: b'%s' % repr(self.urlconf_name) # OK u'%s' % repr(self.urlconf_name).decode('mbcs') # OK u'%s' % repr(self.urlconf_name) # UnicodeEncodeError
You may reproduce this by renaming project path using non-latin characters, so self.urlconf_name.__file__
(binary string) will contain non-ASCII.
comment:7 by , 12 years ago
Sorry, I was wrongly assuming that self.urlconf_name was unicode, which is not.
Then, when I test with a non-ascii character in project path, Django breaks at several places. I do not say that we should not try to fix it, but currently it is probably not safe to do so...
comment:8 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
We recently made progress in how we handle non-ascii paths (#19357). I've just tested technical_404_response
and it ran fine. Reopen if you can reproduce on recent code.
This looks very similar to #17566, which has been fixed when I committed the fix for #17892. Would it be possible for you to test on master?