Opened 6 years ago

Closed 5 years ago

Last modified 4 years ago

#11030 closed Uncategorized (wontfix)

File uploads break on non english filesystem encoding

Reported by: Honza_Kral Owned by: nobody
Component: File uploads/storage Version: 1.2
Severity: Normal Keywords: file path encoding
Cc: david.danier@…, lists@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The tests produce:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 43-44: character maps to <undefined>

The fix just converts file paths to bytestring.

Attachments (1)

11030-against-trunk-10686.diff (1.8 KB) - added by Honza_Kral 6 years ago.

Download all attachments as: .zip

Change History (9)

Changed 6 years ago by Honza_Kral

comment:1 Changed 6 years ago by jacob

  • Resolution set to fixed
  • Status changed from new to closed

(In [10695]) [1.0.X] Fixed #11030: fixed file uploads on non-utf8 filesystem encoding. Thanks, Honza Kral. Backport of [10693] from trunk.

comment:2 Changed 5 years ago by kmtracey

(In [12661]) Fixed #11030: Reverted a change that assumed the file system encoding was utf8, and changed a test to demonstrate how that assumption corrupted uploaded non-ASCII file names on systems that don't use utf8 as their file system encoding (Windows for one, specifically). Thanks for the report to vrehak.

comment:3 Changed 5 years ago by kmtracey

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Whoops that was supposed to be fixed #12898 and refs this one. r12661 likely re-breaks this one, but there is not enough detail here for me to recreate the problem so I cannot say what would be a better fix for this one.

comment:4 Changed 5 years ago by edevil

  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Version changed from SVN to 1.2

This change broke my code when upgrading from 1.1.1 to 1.2.1, and this was not listed in the documentation…

If I upload a file with the name "André.jpg" these are the different results for django.core.file.FileSystemStorage.path():

input - '/servers/staging/sapoopenid/media' u'avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg'

1.1.1 - '/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xc3\xa9.jpg'
1.2.1 - u'/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg'

Then then os.path.exists() is called on this:

1.1.1

os.path.exists('/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xc3\xa9.jpg')

True

1.2.1

os.path.exists(u'/servers/staging/sapoopenid/media/avtr/4791526d60e0ce89ddc5e668c4aa2bb2de08fbc4/Andr\xe9.jpg')

Traceback (most recent call last):

File "<stdin>", line 1, in <module>
File "/servers/python/lib/python2.6/genericpath.py", line 18, in exists

st = os.stat(path)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 84: ordinal not in range(128)

This causes all sorts of trouble. The storage can't even delete files with names like these because exists() is called on them...

comment:5 Changed 5 years ago by kmtracey

  • Resolution set to wontfix
  • Status changed from reopened to closed

The correct fix here is to set up the environment for your running code so that unicode can be passed to the file system functions. Django assuming utf-8 is just wrong; some file systems do not use that encoding and thus Django assuming that encoding for uploaded files quietly corrupts file names on those systems. That's worse than a loud error. There is some doc on how to set up the environment for Apache here: http://docs.djangoproject.com/en/dev/howto/deployment/modpython/#if-you-get-a-unicodeencodeerror. That doc does belong in a more prominent place, not buried in a section on a deployment method that is no longer the recommended one, but fixing the doc should be the subject of a different ticket.

comment:6 Changed 5 years ago by edevil

Thanks for the info, I've already done the encoding setup now. This was more of an heads-up for people with similar problems since no mention of this is made in the release notes and upgrading to 1.2 broke running code.

As you said, this info should be somewhere where it gets more attention since it's not even specific to mod_python (I use mod_wsgi).

comment:7 Changed 4 years ago by David Danier <david.danier@…>

  • Cc david.danier@… added
  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset

Perhaps the low-level-storage API could use smart_str(filename, encoding=sys.getfilesystemencoding()) to solve this issue without having to modify the environment? Anyways I'd love to see the docs somewhere more prominent.

comment:8 Changed 4 years ago by Mailing List SVR <lists@…>

  • Cc lists@… added
Note: See TracTickets for help on using tickets.
Back to Top