Opened 12 years ago
Closed 10 years ago
#19367 closed Bug (fixed)
ContentFile fails to save with filesystem storage when initialized with unicode string
Reported by: | Alexey Boriskin | Owned by: | nobody |
---|---|---|---|
Component: | File uploads/storage | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Attached test fails for me on python 2.7 with the following traceback:
Traceback (most recent call last): File "/Users/voidus/Documents/workspace/django-trunk/tests/regressiontests/file_storage/tests.py", line 590, in test_content_saving self.storage.save('unicode.txt', ContentFile("español")) File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 48, in save name = self._save(name, content) File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 206, in _save _file.write(chunk) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 4: ordinal not in range(128)
Attachments (3)
Change History (13)
by , 12 years ago
Attachment: | ticket19367.txt added |
---|
comment:1 by , 12 years ago
IMHO it's an error to initialize a file-like object with unicode content; files are bytes.
That said, if Django currently supports it, we shouldn't remove this possibility outright (ie. without deprecation).
Is this a regression from Django 1.4?
comment:2 by , 12 years ago
I was the bad guy who added support for unicode argument to ContentFile (in #11739).
After re-reading the ticket, I tend to think now that even if we should still be liberal in what we accept as parameter, we should force conversion to bytes. The downside is that the write/read cycle is not symetrical (write unicode (or bytes), read bytes), but as Aymeric said, files are bytes. I just quickly tested a patch with this idea, and except of the specific "symetrical" test committed in #11739, the test suite passes.
comment:3 by , 12 years ago
Just a note: files are bytes, unless there're opened in python3 with 't' option
by , 12 years ago
Attachment: | 19367-2a.diff added |
---|
Approach 1: Different behaviour depending on python 2 or 3
comment:4 by , 12 years ago
Has patch: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
I'm still undecided about the best way to fix this. On one side, it would be easier and more consistent to always convert the content to bytes (Approach 2), but on the other side, opening text files on Python 3 returns unicode content, so we might also reflect that difference in our implementation (Approach 1).
comment:5 by , 12 years ago
I'd prefer approach 1, because it is more consistent in my eyes with python2 and python3 files behaviour - and this behaviour is, in fact, different for python2 and python3.
comment:7 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
comment:9 by , 10 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
I just ran the tests on the development code and had the test fail.
This is a Windows machine with default locale 'cp1255'.
I think the solution chosen to fix this bug relies on the default charset being UTF-8,
which is a problem, especially on Windows (where it never is).
ERROR: test_content_saving (file_storage.tests.ContentFileStorageTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\data\dev\django\django\tests\file_storage\tests.py", line 755, in test_content_saving self.storage.save('unicode.txt', ContentFile("espa?ol")) File "c:\data\dev\django\django\django\core\files\storage.py", line 64, in save name = self._save(name, content) File "c:\data\dev\django\django\django\core\files\storage.py", line 257, in _save _file.write(chunk) File "c:\Python34\lib\encodings\cp1255.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\xf1' in position 4: character maps to <undefined>
As an extra suggestion: this test wouldn't fail if ñ is mapped in the default encoding of the machine.
I suggest adding a test for a file with Unicode characters that don't map in cp1251, e.g. Russian
comment:10 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Please open a new ticket rather than reopening one that has been already released.
Failing test