UnicodeDecodeError when uploading file with non-english filename.
|Reported by:||bear330||Owned by:||Leah Culver|
|Severity:||Keywords:||files, unicode, FileBackend fs-rf|
|Cc:||Triage Stage:||Ready for checkin|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Upload a file using newforms I will get a UploadedFile object which contains filename and content for uploaded file.
If I upload a file with english file name ('abcd.jpg'), everything is right.
But if not (for example: '中文.jpg', when I assign the UploadedFile object's filename to ImageField or FileField in a model and save it, I will get a UnicodeDecodeError.
This is because the django.http.parse_file_upload will treat filename as 'str' object not 'unicode' object.
I must do this manually to avoid this bug:
filename = uploadedFileObj.filename.decode('utf8')
After that, UnicodeDecodeError will not happen again, but the FileField's value in database will be '.jpg'.
OH! terrible! That is because the django.utils.text.get_valid_filename do this:
re.sub(r'[-A-Za-z0-9_.]', , s)
This will be good in english file name, but not in other languages.
After the re.sub, '中文.jpg' => u'\u4e2d\u6587.jpg' will be u'.jpg'.
For me, this is very serious problem.
At this time, I can fix that by doing decode('utf8') and override get_valid_filename manually.
But I hope this bug will be fixed by django officially.
Thanks for your effort. :)
Change History (21)
comment:1 Changed 9 years ago by
|Patch needs improvement:||unset|
|Summary:||Error while upload file with non-english filename. → UnicodeDecodeError when uploading file with non-english filename.|
|Triage Stage:||Unreviewed → Accepted|