Opened 12 years ago

Closed 8 years ago

Last modified 8 years ago

#17686 closed Cleanup/optimization (fixed)

file.save crashes on unicode filename

Reported by: sylvain.lebon@… Owned by: Florian Demmer
Component: Documentation Version: dev
Severity: Normal Keywords:
Cc: Simon Charette, lrekucki@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I try to add a file into a FileField of one of my object.
The problem is I get a file with a unicode name with special characters.

I have
name = (u'l\u2019\xe9cran.png').encode("utf-8")

now if I do
str1 = "%s%s"(MEDIA_ROOT, name)

I have
'/home/oi/OIFS/Capture d\xe2\x80\x99\xc3\xa9cran 2012-01-24 \xc3\xa0 14.58.48.png'

and os.stat(str1) passes.

but with
str2 = safe_join(MEDIA_ROOT, name)

str is then
u'/home/oi/OIFS/Capture d\u2019\xe9cran 2012-01-24 \xe0 14.58.48.png'

and os.stat fails.

The thing is file.save(name, content, False) uses safe_join and then os.stat.

Now I'm not sure if I should get the name in a different encoding but I don't seem to manage to get it right.

Thanks

Change History (12)

comment:1 by Łukasz Rekucki, 12 years ago

Triage Stage: UnreviewedAccepted

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

Last edited 12 years ago by Łukasz Rekucki (previous) (diff)

comment:2 by Simon Charette, 12 years ago

Cc: charette.s@… added

comment:3 by Łukasz Rekucki, 12 years ago

Cc: lrekucki@… added

in reply to:  1 comment:4 by sylvain.lebon@…, 12 years ago

What you say makes sense, but I don't think that's my point.

What I'm saying is that my system is configured tu use UTF-8 : sys.getfilesystemencoding() gives me "UTF-8", but safe_join won't give me a UTF-8 encoded string. A .encode("utf-8") should be applied to the string before it is passed to os.stat, but the File.save() doesn't handle that. I figured it might be a bug of safe_join not to give a string in the same encoding it was passed.

In any case, I don't manage to get safe_join giving back a string that my os.stat can handle. Is it as safe as it pretends to be?

Anyhow, I agree with you that the FileStorage docs could be more precise on some points, including this one.

Replying to lrekucki:

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

in reply to:  1 comment:5 by sylvain.lebon@…, 12 years ago

You're actually right. For some reason, django wouldn't get the locale settings of my user. In fact the fcgi script wouldn't, even if my envvars script was properly configured.
I had to set DefaultInitEnv LANG "en_US.UTF-8" in my sites-available/default.
Now my view gets the right filesystemencoding and that solves it.
Thanks for your help!

Replying to lrekucki:

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

comment:6 by Claude Paroz, 12 years ago

Component: File uploads/storageDocumentation
Type: UncategorizedCleanup/optimization

comment:7 by Vajrasky Kok, 10 years ago

I can not reproduce this in master (1.7).

comment:8 by Florian Demmer, 8 years ago

Has patch: set
Owner: changed from nobody to Florian Demmer
Status: newassigned

please review my pull request, with documentation updates: https://github.com/django/django/pull/5587

comment:9 by Simon Charette, 8 years ago

Cc: Simon Charette added; charette.s@… removed
Version: 1.3master

comment:10 by Tim Graham <timograham@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In 25b912ab:

Fixed #17686, refs #17816 -- Added "Files" section to Unicode topic.

Thanks Fako Berkers for help with the patch.

comment:11 by Tim Graham <timograham@…>, 8 years ago

In da20004a:

[1.8.x] Fixed #17686, refs #17816 -- Added "Files" section to Unicode topic.

Thanks Fako Berkers for help with the patch.

Backport of 25b912abbe31fa440e702b5273c18cf74e2d6e0b from master

comment:12 by Tim Graham <timograham@…>, 8 years ago

In 84006fd:

[1.9.x] Fixed #17686, refs #17816 -- Added "Files" section to Unicode topic.

Thanks Fako Berkers for help with the patch.

Backport of 25b912abbe31fa440e702b5273c18cf74e2d6e0b from master

Note: See TracTickets for help on using tickets.
Back to Top