Code

Opened 2 years ago

Last modified 8 months ago

#17686 new Cleanup/optimization

file.save crashes on unicode filename

Reported by: sylvain.lebon@… Owned by: nobody
Component: Documentation Version: 1.3
Severity: Normal Keywords:
Cc: charette.s@…, lrekucki@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I try to add a file into a FileField of one of my object.
The problem is I get a file with a unicode name with special characters.

I have
name = (u'l\u2019\xe9cran.png').encode("utf-8")

now if I do
str1 = "%s%s"(MEDIA_ROOT, name)

I have
'/home/oi/OIFS/Capture d\xe2\x80\x99\xc3\xa9cran 2012-01-24 \xc3\xa0 14.58.48.png'

and os.stat(str1) passes.

but with
str2 = safe_join(MEDIA_ROOT, name)

str is then
u'/home/oi/OIFS/Capture d\u2019\xe9cran 2012-01-24 \xe0 14.58.48.png'

and os.stat fails.

The thing is file.save(name, content, False) uses safe_join and then os.stat.

Now I'm not sure if I should get the name in a different encoding but I don't seem to manage to get it right.

Thanks

Attachments (0)

Change History (7)

comment:1 follow-ups: Changed 2 years ago by lrekucki

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

Last edited 2 years ago by lrekucki (previous) (diff)

comment:2 Changed 2 years ago by charettes

  • Cc charette.s@… added

comment:3 Changed 2 years ago by lrekucki

  • Cc lrekucki@… added

comment:4 in reply to: ↑ 1 Changed 2 years ago by sylvain.lebon@…

What you say makes sense, but I don't think that's my point.

What I'm saying is that my system is configured tu use UTF-8 : sys.getfilesystemencoding() gives me "UTF-8", but safe_join won't give me a UTF-8 encoded string. A .encode("utf-8") should be applied to the string before it is passed to os.stat, but the File.save() doesn't handle that. I figured it might be a bug of safe_join not to give a string in the same encoding it was passed.

In any case, I don't manage to get safe_join giving back a string that my os.stat can handle. Is it as safe as it pretends to be?

Anyhow, I agree with you that the FileStorage docs could be more precise on some points, including this one.

Replying to lrekucki:

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

comment:5 in reply to: ↑ 1 Changed 2 years ago by sylvain.lebon@…

You're actually right. For some reason, django wouldn't get the locale settings of my user. In fact the fcgi script wouldn't, even if my envvars script was properly configured.
I had to set DefaultInitEnv LANG "en_US.UTF-8" in my sites-available/default.
Now my view gets the right filesystemencoding and that solves it.
Thanks for your help!

Replying to lrekucki:

This is a common error, which actually isn't Django related:

Most file system related functions in Python (like os.stat) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly set LC_ALL, LANG, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.

I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.

comment:6 Changed 2 years ago by claudep

  • Component changed from File uploads/storage to Documentation
  • Type changed from Uncategorized to Cleanup/optimization

comment:7 Changed 8 months ago by vajrasky

I can not reproduce this in master (1.7).

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.