Opened 4 years ago

Closed 9 months ago

Last modified 9 months ago

#17686 closed Cleanup/optimization (fixed)

file.save crashes on unicode filename

Reported by: sylvain.lebon@… Owned by: fdemmer
Component: Documentation Version: master
Severity: Normal Keywords:
Cc: charettes, lrekucki@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I try to add a file into a FileField of one of my object.
The problem is I get a file with a unicode name with special characters.

I have
name = (u'l\u2019\xe9cran.png').encode("utf-8")

now if I do
str1 = "%s%s"(MEDIA_ROOT, name)

I have
'/home/oi/OIFS/Capture d\xe2\x80\x99\xc3\xa9cran 2012-01-24 \xc3\xa0 14.58.48.png'

and os.stat(str1) passes.

but with
str2 = safe_join(MEDIA_ROOT, name)

str is then
u'/home/oi/OIFS/Capture d\u2019\xe9cran 2012-01-24 \xe0 14.58.48.png'

and os.stat fails.

The thing is file.save(name, content, False) uses safe_join and then os.stat.

Now I'm not sure if I should get the name in a different encoding but I don't seem to manage to get it right.

Thanks

Change History (12)

comment:1 follow-ups: Changed 4 years ago by lrekucki

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoding string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

Version 0, edited 4 years ago by lrekucki (next)

comment:2 Changed 4 years ago by charettes

  • Cc charette.s@… added

comment:3 Changed 4 years ago by lrekucki

  • Cc lrekucki@… added

comment:4 in reply to: ↑ 1 Changed 4 years ago by sylvain.lebon@…

What you say makes sense, but I don't think that's my point.

What I'm saying is that my system is configured tu use UTF-8 : sys.getfilesystemencoding() gives me "UTF-8", but safe_join won't give me a UTF-8 encoded string. A .encode("utf-8") should be applied to the string before it is passed to os.stat, but the File.save() doesn't handle that. I figured it might be a bug of safe_join not to give a string in the same encoding it was passed.

In any case, I don't manage to get safe_join giving back a string that my os.stat can handle. Is it as safe as it pretends to be?

Anyhow, I agree with you that the FileStorage docs could be more precise on some points, including this one.

Replying to lrekucki:

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

comment:5 in reply to: ↑ 1 Changed 4 years ago by sylvain.lebon@…

You're actually right. For some reason, django wouldn't get the locale settings of my user. In fact the fcgi script wouldn't, even if my envvars script was properly configured.
I had to set DefaultInitEnv LANG "en_US.UTF-8" in my sites-available/default.
Now my view gets the right filesystemencoding and that solves it.
Thanks for your help!

Replying to lrekucki:

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

comment:6 Changed 4 years ago by claudep

  • Component changed from File uploads/storage to Documentation
  • Type changed from Uncategorized to Cleanup/optimization

comment:7 Changed 3 years ago by vajrasky

I can not reproduce this in master (1.7).

comment:8 Changed 9 months ago by fdemmer

  • Has patch set
  • Owner changed from nobody to fdemmer
  • Status changed from new to assigned

please review my pull request, with documentation updates: https://github.com/django/django/pull/5587

comment:9 Changed 9 months ago by charettes

  • Cc charettes added; charette.s@… removed
  • Version changed from 1.3 to master

comment:10 Changed 9 months ago by Tim Graham <timograham@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 25b912ab:

Fixed #17686, refs #17816 -- Added "Files" section to Unicode topic.

Thanks Fako Berkers for help with the patch.

comment:11 Changed 9 months ago by Tim Graham <timograham@…>

In da20004a:

[1.8.x] Fixed #17686, refs #17816 -- Added "Files" section to Unicode topic.

Thanks Fako Berkers for help with the patch.

Backport of 25b912abbe31fa440e702b5273c18cf74e2d6e0b from master

comment:12 Changed 9 months ago by Tim Graham <timograham@…>

In 84006fd:

[1.9.x] Fixed #17686, refs #17816 -- Added "Files" section to Unicode topic.

Thanks Fako Berkers for help with the patch.

Backport of 25b912abbe31fa440e702b5273c18cf74e2d6e0b from master

Note: See TracTickets for help on using tickets.
Back to Top