Opened 15 years ago

Closed 14 years ago

Last modified 13 years ago

#11030 closed Uncategorized (wontfix)

File uploads break on non english filesystem encoding

Reported by: Honza Král Owned by: nobody
Component: File uploads/storage Version: 1.2
Severity: Normal Keywords: file path encoding
Cc: david.danier@…, lists@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The tests produce:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 43-44: character maps to <undefined>

The fix just converts file paths to bytestring.

Attachments (1)

11030-against-trunk-10686.diff (1.8 KB ) - added by Honza Král 15 years ago.

Download all attachments as: .zip

Change History (9)

by Honza Král, 15 years ago

comment:1 by Jacob, 15 years ago

Resolution: fixed
Status: newclosed

(In [10695]) [1.0.X] Fixed #11030: fixed file uploads on non-utf8 filesystem encoding. Thanks, Honza Kral. Backport of [10693] from trunk.

comment:2 by Karen Tracey, 15 years ago

(In [12661]) Fixed #11030: Reverted a change that assumed the file system encoding was utf8, and changed a test to demonstrate how that assumption corrupted uploaded non-ASCII file names on systems that don't use utf8 as their file system encoding (Windows for one, specifically). Thanks for the report to vrehak.

comment:3 by Karen Tracey, 15 years ago

Whoops that was supposed to be fixed #12898 and refs this one. r12661 likely re-breaks this one, but there is not enough detail here for me to recreate the problem so I cannot say what would be a better fix for this one.

comment:4 by André Cruz, 14 years ago

Resolution: fixed
Status: closedreopened
Version: SVN1.2

This change broke my code when upgrading from 1.1.1 to 1.2.1, and this was not listed in the documentation…

If I upload a file with the name "André.jpg" these are the different results for django.core.file.FileSystemStorage.path():

input - '/servers/staging/sapoopenid/media' u'avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg'

1.1.1 - '/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xc3\xa9.jpg'
1.2.1 - u'/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg'

Then then os.path.exists() is called on this:

1.1.1

os.path.exists('/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xc3\xa9.jpg')

True

1.2.1

os.path.exists(u'/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg')

Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "/servers/python/lib/python2.6/genericpath.py", line 18, in exists

st = os.stat(path)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 84: ordinal not in range(128)

This causes all sorts of trouble. The storage can't even delete files with names like these because exists() is called on them...

comment:5 by Karen Tracey, 14 years ago

Resolution: wontfix
Status: reopenedclosed

The correct fix here is to set up the environment for your running code so that unicode can be passed to the file system functions. Django assuming utf-8 is just wrong; some file systems do not use that encoding and thus Django assuming that encoding for uploaded files quietly corrupts file names on those systems. That's worse than a loud error. There is some doc on how to set up the environment for Apache here: http://docs.djangoproject.com/en/dev/howto/deployment/modpython/#if-you-get-a-unicodeencodeerror. That doc does belong in a more prominent place, not buried in a section on a deployment method that is no longer the recommended one, but fixing the doc should be the subject of a different ticket.

comment:6 by André Cruz, 14 years ago

Thanks for the info, I've already done the encoding setup now. This was more of an heads-up for people with similar problems since no mention of this is made in the release notes and upgrading to 1.2 broke running code.

As you said, this info should be somewhere where it gets more attention since it's not even specific to mod_python (I use mod_wsgi).

comment:7 by David Danier <david.danier@…>, 13 years ago

Cc: david.danier@… added
Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

Perhaps the low-level-storage API could use smart_str(filename, encoding=sys.getfilesystemencoding()) to solve this issue without having to modify the environment? Anyways I'd love to see the docs somewhere more prominent.

comment:8 by Mailing List SVR <lists@…>, 13 years ago

Cc: lists@… added
Note: See TracTickets for help on using tickets.
Back to Top