#17686 closed Cleanup/optimization (fixed)
file.save crashes on unicode filename
Reported by: | Owned by: | Florian Demmer | |
---|---|---|---|
Component: | Documentation | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, lrekucki@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I try to add a file into a FileField of one of my object.
The problem is I get a file with a unicode name with special characters.
I have
name = (u'l\u2019\xe9cran.png').encode("utf-8")
now if I do
str1 = "%s%s"(MEDIA_ROOT, name)
I have
'/home/oi/OIFS/Capture d\xe2\x80\x99\xc3\xa9cran 2012-01-24 \xc3\xa0 14.58.48.png'
and os.stat(str1) passes.
but with
str2 = safe_join(MEDIA_ROOT, name)
str is then
u'/home/oi/OIFS/Capture d\u2019\xe9cran 2012-01-24 \xe0 14.58.48.png'
and os.stat fails.
The thing is file.save(name, content, False) uses safe_join and then os.stat.
Now I'm not sure if I should get the name in a different encoding but I don't seem to manage to get it right.
Thanks
Change History (12)
follow-ups: 4 5 comment:1 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 13 years ago
Cc: | added |
---|
comment:3 by , 13 years ago
Cc: | added |
---|
comment:4 by , 13 years ago
What you say makes sense, but I don't think that's my point.
What I'm saying is that my system is configured tu use UTF-8 : sys.getfilesystemencoding()
gives me "UTF-8"
, but safe_join
won't give me a UTF-8 encoded string. A .encode("utf-8")
should be applied to the string before it is passed to os.stat, but the File.save()
doesn't handle that. I figured it might be a bug of safe_join
not to give a string in the same encoding it was passed.
In any case, I don't manage to get safe_join
giving back a string that my os.stat can handle. Is it as safe as it pretends to be?
Anyhow, I agree with you that the FileStorage docs could be more precise on some points, including this one.
Replying to lrekucki:
This is a common error, which actually isn't Django related:
Most file system related functions in Python (like
os.stat
) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly setLC_ALL
,LANG
, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.
I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.
comment:5 by , 13 years ago
You're actually right. For some reason, django wouldn't get the locale settings of my user. In fact the fcgi script wouldn't, even if my envvars script was properly configured.
I had to set DefaultInitEnv LANG "en_US.UTF-8"
in my sites-available/default.
Now my view gets the right filesystemencoding and that solves it.
Thanks for your help!
Replying to lrekucki:
This is a common error, which actually isn't Django related:
Most file system related functions in Python (like
os.stat
) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoded string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly setLC_ALL
,LANG
, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.
I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.
comment:6 by , 13 years ago
Component: | File uploads/storage → Documentation |
---|---|
Type: | Uncategorized → Cleanup/optimization |
comment:8 by , 9 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
please review my pull request, with documentation updates: https://github.com/django/django/pull/5587
comment:9 by , 9 years ago
Cc: | added; removed |
---|---|
Version: | 1.3 → master |
This is a common error, which actually isn't Django related:
Most file system related functions in Python (like
os.stat
) accept unicode strings which are then encoded using the default encoding of the file system (see http://docs.python.org/library/sys.html#sys.getfilesystemencoding). This is actually the only sane thing to do - if you pass a manually encoding string you'll have no guarantee it will match what was actually written on the FS. On Unix platforms, this depends on the user's *locale*. Thus if the user you're running the server on, doesn't have a properly setLC_ALL
,LANG
, etc. in his enviroment, the FS encoding will be assumed ASCII and os.stat will crash.I'm marking this as accepted, because I think it's worth putting a note about this in FileStorage docs.