django.utils._os.safe_join should return a native string
|Reported by:||Aymeric Augustin||Owned by:||nobody|
|Has patch:||no||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
By default, filesystem paths are represented with native strings (ie.
str objects) in Python 2 and Python 3.
% python2 >>> import os >>> type(os.listdir('.')) <type 'str'>
% python3 >>> import os >>> type(os.listdir('.')) <class 'str'>
In other words, they were switch from bytestrings to unicode in Python 3.
A brief interlude for perfectionists and pedants :)
In Python 2, it's possible to use unicode for filesystem paths, when
os.path.supports_unicode_filenames = True, but that's not the default mode of operation.
In Python 3, it's possible to use bytestrings for filesystem paths, because not all supported platforms sport unicode-aware filesystems: see http://docs.python.org/3/library/os.path:
The path parameters can be passed as either strings, or bytes. Applications are encouraged to represent file names as (Unicode) character strings.
My initial statement still reflects the intent of Python's developers, from which Django shouldn't deviate.
The conversion to unicode was introduced 4 years ago in 8fb1459b5294fb9327b241fffec8576c5aa3fc7e. This commit was fixing an issue with the reporting of template loading errors.
In hindsight, it would have been better to keep
safe_join similar to
os.path.join, and preprocess the arguments or introduce a
safe_join is used in four places in Django. Auditing these for proper use of bytestrings vs. unicode strings seems doable.
safe_join isn't documented and the name
_os is a strong hint that it's a private API.
Therefore, I propose:
- to remove the coercion to unicode — which is incorrect anyway, because it doesn't honor
sys.getfilesystemencoding(), and thus fails on non-utf-8 filesystems;
- to perform the coercion in callers that need it, or remove it altogether if possible.