Opened 11 years ago

Closed 9 years ago

#19367 closed Bug (fixed)

ContentFile fails to save with filesystem storage when initialized with unicode string

Reported by: Alexey Boriskin Owned by: nobody
Component: File uploads/storage Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Attached test fails for me on python 2.7 with the following traceback:

Traceback (most recent call last):
  File "/Users/voidus/Documents/workspace/django-trunk/tests/regressiontests/file_storage/tests.py", line 590, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("español"))
  File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 48, in save
    name = self._save(name, content)
  File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 206, in _save
    _file.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 4: ordinal not in range(128)

Attachments (3)

ticket19367.txt (1.3 KB ) - added by Alexey Boriskin 11 years ago.
Failing test
19367-2a.diff (2.7 KB ) - added by Claude Paroz 11 years ago.
Approach 1: Different behaviour depending on python 2 or 3
19367-2b.diff (2.7 KB ) - added by Claude Paroz 11 years ago.
Approach 2: Always force content to bytes

Download all attachments as: .zip

Change History (13)

by Alexey Boriskin, 11 years ago

Attachment: ticket19367.txt added

Failing test

comment:1 by Aymeric Augustin, 11 years ago

IMHO it's an error to initialize a file-like object with unicode content; files are bytes.

That said, if Django currently supports it, we shouldn't remove this possibility outright (ie. without deprecation).

Is this a regression from Django 1.4?

comment:2 by Claude Paroz, 11 years ago

I was the bad guy who added support for unicode argument to ContentFile (in #11739).

After re-reading the ticket, I tend to think now that even if we should still be liberal in what we accept as parameter, we should force conversion to bytes. The downside is that the write/read cycle is not symetrical (write unicode (or bytes), read bytes), but as Aymeric said, files are bytes. I just quickly tested a patch with this idea, and except of the specific "symetrical" test committed in #11739, the test suite passes.

comment:3 by Alexey Boriskin, 11 years ago

Just a note: files are bytes, unless there're opened in python3 with 't' option

by Claude Paroz, 11 years ago

Attachment: 19367-2a.diff added

Approach 1: Different behaviour depending on python 2 or 3

by Claude Paroz, 11 years ago

Attachment: 19367-2b.diff added

Approach 2: Always force content to bytes

comment:4 by Claude Paroz, 11 years ago

Has patch: set
Triage Stage: UnreviewedAccepted

I'm still undecided about the best way to fix this. On one side, it would be easier and more consistent to always convert the content to bytes (Approach 2), but on the other side, opening text files on Python 3 returns unicode content, so we might also reflect that difference in our implementation (Approach 1).

comment:5 by Alexey Boriskin, 11 years ago

I'd prefer approach 1, because it is more consistent in my eyes with python2 and python3 files behaviour - and this behaviour is, in fact, different for python2 and python3.

comment:6 by Aymeric Augustin, 11 years ago

Triage Stage: AcceptedReady for checkin

Patch 2a looks fine.

comment:7 by Claude Paroz <claude@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 34dcf51e06fc57fc5462bd40f6a4c8ee949521da:

Fixed #19367 -- Fixed saving ContentFile in filesystem storage

This was not working properly when ContentFile was initialized with
an unicode string.
Thanks Alexey Boriskin for the report and the test.

comment:8 by Claude Paroz <claude@…>, 11 years ago

In e9301ae451f84b06687d27af0a43bcb986054f63:

[1.5.x] Fixed #19367 -- Fixed saving ContentFile in filesystem storage

This was not working properly when ContentFile was initialized with
an unicode string.
Thanks Alexey Boriskin for the report and the test.

Backport of 34dcf51e06 from master.

comment:9 by daphshez, 9 years ago

Resolution: fixed
Status: closednew

I just ran the tests on the development code and had the test fail.

This is a Windows machine with default locale 'cp1255'.

I think the solution chosen to fix this bug relies on the default charset being UTF-8,
which is a problem, especially on Windows (where it never is).

ERROR: test_content_saving (file_storage.tests.ContentFileStorageTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\data\dev\django\django\tests\file_storage\tests.py", line 755, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("espa?ol"))
  File "c:\data\dev\django\django\django\core\files\storage.py", line 64, in save
    name = self._save(name, content)
  File "c:\data\dev\django\django\django\core\files\storage.py", line 257, in _save
    _file.write(chunk)
  File "c:\Python34\lib\encodings\cp1255.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf1' in position 4:
 character maps to <undefined>

As an extra suggestion: this test wouldn't fail if ñ is mapped in the default encoding of the machine.
I suggest adding a test for a file with Unicode characters that don't map in cp1251, e.g. Russian

Last edited 9 years ago by daphshez (previous) (diff)

comment:10 by Tim Graham, 9 years ago

Resolution: fixed
Status: newclosed

Please open a new ticket rather than reopening one that has been already released.

Note: See TracTickets for help on using tickets.
Back to Top