Code

Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#9579 closed (fixed)

problem with safe_join function from django.utils._os

Reported by: gonzalodelgado Owned by: kmtracey
Component: Uncategorized Version: master
Severity: Keywords: bug utils _os safe_join
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: UI/UX:

Description

The directory I use for most of my programming work is '/home/gonzalo/Código' (código is spanish for "code").
Since my django projects live inside that directory, TEMPLATE_DIRS always contains it.
I got this problem when loading templates from an app within that directory (Django hinted: "The string that could not be encoded/decoded was: alo/C��digo")
This problem best explained with this example:

$ cd Código/
$ python
Python 2.5.2 (r252:60911, Oct  5 2008, 19:24:49) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from django.utils import _os
>>> _os.safe_join('spam', 'eggs')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/django/utils/_os.py", line 16, in safe_join
    final_path = normcase(abspath(join(base, *paths)))
  File "/usr/lib/python2.5/posixpath.py", line 403, in abspath
    path = join(os.getcwd(), path)
  File "/usr/lib/python2.5/posixpath.py", line 65, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

safe_join uses force_unicode to convert 'spam' and 'eggs' to unicode objects, but then uses os.path.abspath which obtains the working directory ('/home/gonzalo/Código') and tries to join it with u'spam' and u'eggs' and fails.

Attachments (1)

safe_join_base_path_correction.patch (985 bytes) - added by gonzalodelgado 6 years ago.
first patch for #9579

Download all attachments as: .zip

Change History (12)

Changed 6 years ago by gonzalodelgado

first patch for #9579

comment:1 Changed 6 years ago by gonzalodelgado

  • Has patch set
  • Needs documentation unset
  • Needs tests set
  • Patch needs improvement unset

Attached patch which uses force_unicode on the root path also, to ensure proper work of safe_join. Seems to work.

comment:2 Changed 6 years ago by gonzalodelgado

  • Version changed from 1.0 to SVN

comment:3 Changed 6 years ago by Daniel Pope <dan@…>

  • Patch needs improvement set

TEMPLATE_DIRS is intended to contain absolute paths only. There's still a bug in safe_join but it shouldn't come up if you have absolute paths in TEMPLATE_DIRS.

The output of abspath (actually you should use os.getcwd()) is not in UTF-8. It's the byte string as stored on disk, so to decode it you'd need to do something like

import locale
force_unicode(os.getcwd(), encoding=locale.getpreferredencoding())

This could still raise an exception if the filename wasn't readable in the current locale's character set. That can happen if the current locale's character set isn't the same as that of the application which created the file. It's actually more correct to do path manipulations on byte strings than interpreting them to unicode strings first.

comment:4 follow-up: Changed 6 years ago by kmtracey

  • Owner changed from nobody to kmtracey
  • Status changed from new to assigned
  • Triage Stage changed from Unreviewed to Accepted

gonzalodelgado doesn't have absolute paths in TEMPLATE_DIRS, the problem is if the current working directory contains non-ASCII characters. Patch does have a problem, since force_unicode on curdir is going to assume utf-8 encoding for the curdir string and that's not always correct -- running with the patch on my Windows box with a directory name containing 'ó', for example, crashes with:

Exception Type: DjangoUnicodeDecodeError at /admin/
Exception Value: 'utf8' codec can't decode bytes in position 33-36: invalid data. You passed in 'D:\\u\\kmt\\software\\web\\playgroundC\xf3digo' (<type 'str'>)

I think the right answer is to use os.getcwdu(), which will hopefully know the correct encoding being used by the file system. I'll figure it out.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 6 years ago by kmtracey

Replying to kmtracey:

gonzalodelgado doesn't have absolute paths in TEMPLATE_DIRS

I meant relative there and I think I'm wrong on it anyway since I can't recreate this now except by putting a non-abolute path in TEMPLATE_DIRS. gonzalodelgado, what is your TEMPLATE_DIRS setting? Also the full traceback from when you really hit the error would be useful to make sure I'm actually looking at the same problem as you are hitting.

comment:6 in reply to: ↑ 5 Changed 6 years ago by gonzalodelgado

Replying to kmtracey:

Replying to kmtracey:

gonzalodelgado doesn't have absolute paths in TEMPLATE_DIRS

I meant relative there and I think I'm wrong on it anyway since I can't recreate this now except by putting a non-abolute path in TEMPLATE_DIRS. gonzalodelgado, what is your TEMPLATE_DIRS setting? Also the full traceback from when you really hit the error would be useful to make sure I'm actually looking at the same problem as you are hitting.

I do use absolute paths for TEMPLATE_DIRS. Reviewing the original problem's Traceback I find that it doesn't have to do with TEMPLATE_DIRS, because it fails when trying to get an apps' template and that app is also within '/home/Código':

Environment:

Request Method: GET
Request URL: http://localhost:8000/admin/cms/page/
Django Version: 1.1 pre-alpha
Python Version: 2.5.2
Installed Applications:
['django.contrib.contenttypes',
 'django.contrib.sites',
 'django.contrib.auth',
 'django.contrib.sessions',
 'django.contrib.admin',
 'cms',
 'membrete']
Installed Middleware:
('django.middleware.common.CommonMiddleware',
 'django.contrib.sessions.middleware.SessionMiddleware',
 'django.contrib.auth.middleware.AuthenticationMiddleware',
 'django.middleware.locale.LocaleMiddleware',
 'django.middleware.doc.XViewMiddleware')


Traceback:
File "/usr/lib/python2.5/site-packages/django/core/handlers/base.py" in get_response
  86.                 response = callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/python2.5/site-packages/django/contrib/admin/sites.py" in root
  157.                 return self.model_page(request, *url.split('/', 2))
File "/usr/lib/python2.5/site-packages/django/views/decorators/cache.py" in _wrapped_view_func
  44.         response = view_func(request, *args, **kwargs)
File "/usr/lib/python2.5/site-packages/django/contrib/admin/sites.py" in model_page
  176.         return admin_obj(request, rest_of_url)
File "apps/cms/admin.py" in __call__
  47.             return self.navigation(request)
File "apps/cms/admin.py" in navigation
  209.             }, context_instance=RequestContext(request))
File "/usr/lib/python2.5/site-packages/django/shortcuts/__init__.py" in render_to_response
  18.     return HttpResponse(loader.render_to_string(*args, **kwargs), **httpresponse_kwargs)
File "/usr/lib/python2.5/site-packages/django/template/loader.py" in render_to_string
  102.         t = get_template(template_name)
File "/usr/lib/python2.5/site-packages/django/template/loader.py" in get_template
  80.     source, origin = find_template_source(template_name)
File "/usr/lib/python2.5/site-packages/django/template/loader.py" in find_template_source
  69.             source, display_name = loader(name, dirs)
File "/usr/lib/python2.5/site-packages/django/template/loaders/app_directories.py" in load_template_source
  54.     for filepath in get_template_sources(template_name, template_dirs):
File "/usr/lib/python2.5/site-packages/django/template/loaders/app_directories.py" in get_template_sources
  45.             yield safe_join(template_dir, template_name)
File "/usr/lib/python2.5/site-packages/django/utils/_os.py" in safe_join
  16.     final_path = normcase(abspath(join(base, *paths)))
File "/usr/lib/python2.5/posixpath.py" in abspath
  403.         path = join(os.getcwd(), path)
File "/usr/lib/python2.5/posixpath.py" in join
  65.             path += '/' + b

Exception Type: UnicodeDecodeError at /admin/cms/page/
Exception Value: 'ascii' codec can't decode byte 0xc3 in position 15: ordinal not in range(128)

comment:7 Changed 6 years ago by Daniel Pope <dan@…>

I was just going by what I recalled from #8965. This bug is really just a further issue with that patch.

abspath also calls normpath. It would be unsafe just to replace it with just join() and getcwdu(). We should just re-implement abspath in a unicode-safe way and leave safe_join as is. It would look like this

def abspath(path):
    if not isabs(path):
        path = join(getcwdu(), path)
    return normpath(path)

os.path.abspath calls the system method on Windows. Though this should work, I'd be slightly wary of missing out on any Windows-specific magic.

comment:8 follow-up: Changed 6 years ago by kmtracey

I wonder why I cannot recreate this? I see it is the app loader, not the file system loader, appearing in the traceback so it is not caused by anything in TEMPLATE_DIRS. The app loader computes a list of 'templates' dirs found underneath app directories when it is first loaded -- in my case, these directories are fully qualified then, so when the paths are joined at execution time it is not necessary for the abspath() implementation to call os.getcwd().

The directories are built up using mod.__file__ as a starting point (where mod is each imported installed app) -- apparently this is always starting out fully-qualified for me yet is relative for gonzalodelgado. Why would that be? I have tried every variant of 'manage.py runserver' and 'python manage.py runserver' etc. on both Windows and Linux (I'm assuming dev server given the 8000 port) and cannot get these paths computed by the app loader to be relative.

Which, in fact, leads to a related but different problem when I try to recreate this on Windows (Linux works fine). The returned mod.__file__ I get on Windows is cp1252 encoded, not utf-8 (even if the dir in question is actually on a remote box where the FS uses utf-8 encoding for names). So the (current trunk code) force_unicode of base in safe_join generates a UnicodeDecode error. That's fixable by converting each of the app template dirs to unicode upfront when the list is created, using the preferred system encoding. But I wish I understood why I see one variant on this problem and the original reporter sees another.

Daniel, yeah I see Windows abspath actually goes to the OS so I understand the wariness of overriding it. It needs some experimenting to make sure it will work.

comment:9 in reply to: ↑ 8 Changed 6 years ago by gonzalodelgado

Replying to kmtracey:

I wonder why I cannot recreate this? I see it is the app loader, not the file system loader, appearing in the traceback so it is not caused by anything in TEMPLATE_DIRS. The app loader computes a list of 'templates' dirs found underneath app directories when it is first loaded -- in my case, these directories are fully qualified then, so when the paths are joined at execution time it is not necessary for the abspath() implementation to call os.getcwd().

The directories are built up using mod.__file__ as a starting point (where mod is each imported installed app) -- apparently this is always starting out fully-qualified for me yet is relative for gonzalodelgado. Why would that be? I have tried every variant of 'manage.py runserver' and 'python manage.py runserver' etc. on both Windows and Linux (I'm assuming dev server given the 8000 port) and cannot get these paths computed by the app loader to be relative.

Which, in fact, leads to a related but different problem when I try to recreate this on Windows (Linux works fine). The returned mod.__file__ I get on Windows is cp1252 encoded, not utf-8 (even if the dir in question is actually on a remote box where the FS uses utf-8 encoding for names). So the (current trunk code) force_unicode of base in safe_join generates a UnicodeDecode error. That's fixable by converting each of the app template dirs to unicode upfront when the list is created, using the preferred system encoding. But I wish I understood why I see one variant on this problem and the original reporter sees another.

It's relative for me because (silly me) I added this to my manage.py:

import sys

sys.path.insert(0, 'apps')

which results in relative path for the apps within my project (which live under the 'apps' subdirectory). I changed that for this:

import os
import sys

sys.path.insert(0, os.path.abspath('apps'))

and the error does not occurr.

(Sorry I didn't tell you about it before, I used it like that all the time and didn't give me problems until now)

comment:10 Changed 6 years ago by kmtracey

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [9411]) Fixed #9579 -- Properly handle apps running with (and specifically, loading templates from) a current working directory path that contains non-ASCII characters. Thanks for the report to gonzalodelgado and for advice on how to fix it to Daniel Pope.

comment:11 Changed 6 years ago by kmtracey

(In [9414]) [1.0.X] Fixed #9579 -- Properly handle apps running with (and specifically, loading templates from) a current working directory path that contains non-ASCII characters. Thanks for the report to gonzalodelgado and for advice on how to fix it to Daniel Pope.

r9411 from trunk.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.