Opened 4 years ago

Closed 18 months ago

#19367 closed Bug (fixed)

ContentFile fails to save with filesystem storage when initialized with unicode string

Reported by: Alexey Boriskin Owned by: nobody
Component: File uploads/storage Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Attached test fails for me on python 2.7 with the following traceback:

Traceback (most recent call last):
  File "/Users/voidus/Documents/workspace/django-trunk/tests/regressiontests/file_storage/tests.py", line 590, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("español"))
  File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 48, in save
    name = self._save(name, content)
  File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 206, in _save
    _file.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 4: ordinal not in range(128)

Attachments (3)

ticket19367.txt (1.3 KB) - added by Alexey Boriskin 4 years ago.
Failing test
19367-2a.diff (2.7 KB) - added by Claude Paroz 4 years ago.
Approach 1: Different behaviour depending on python 2 or 3
19367-2b.diff (2.7 KB) - added by Claude Paroz 4 years ago.
Approach 2: Always force content to bytes

Download all attachments as: .zip

Change History (13)

Changed 4 years ago by Alexey Boriskin

Attachment: ticket19367.txt added

Failing test

comment:1 Changed 4 years ago by Aymeric Augustin

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

IMHO it's an error to initialize a file-like object with unicode content; files are bytes.

That said, if Django currently supports it, we shouldn't remove this possibility outright (ie. without deprecation).

Is this a regression from Django 1.4?

comment:2 Changed 4 years ago by Claude Paroz

I was the bad guy who added support for unicode argument to ContentFile (in #11739).

After re-reading the ticket, I tend to think now that even if we should still be liberal in what we accept as parameter, we should force conversion to bytes. The downside is that the write/read cycle is not symetrical (write unicode (or bytes), read bytes), but as Aymeric said, files are bytes. I just quickly tested a patch with this idea, and except of the specific "symetrical" test committed in #11739, the test suite passes.

comment:3 Changed 4 years ago by Alexey Boriskin

Just a note: files are bytes, unless there're opened in python3 with 't' option

Changed 4 years ago by Claude Paroz

Attachment: 19367-2a.diff added

Approach 1: Different behaviour depending on python 2 or 3

Changed 4 years ago by Claude Paroz

Attachment: 19367-2b.diff added

Approach 2: Always force content to bytes

comment:4 Changed 4 years ago by Claude Paroz

Has patch: set
Triage Stage: UnreviewedAccepted

I'm still undecided about the best way to fix this. On one side, it would be easier and more consistent to always convert the content to bytes (Approach 2), but on the other side, opening text files on Python 3 returns unicode content, so we might also reflect that difference in our implementation (Approach 1).

comment:5 Changed 4 years ago by Alexey Boriskin

I'd prefer approach 1, because it is more consistent in my eyes with python2 and python3 files behaviour - and this behaviour is, in fact, different for python2 and python3.

comment:6 Changed 4 years ago by Aymeric Augustin

Triage Stage: AcceptedReady for checkin

Patch 2a looks fine.

comment:7 Changed 4 years ago by Claude Paroz <claude@…>

Resolution: fixed
Status: newclosed

In 34dcf51e06fc57fc5462bd40f6a4c8ee949521da:

Fixed #19367 -- Fixed saving ContentFile in filesystem storage

This was not working properly when ContentFile was initialized with
an unicode string.
Thanks Alexey Boriskin for the report and the test.

comment:8 Changed 4 years ago by Claude Paroz <claude@…>

In e9301ae451f84b06687d27af0a43bcb986054f63:

[1.5.x] Fixed #19367 -- Fixed saving ContentFile in filesystem storage

This was not working properly when ContentFile was initialized with
an unicode string.
Thanks Alexey Boriskin for the report and the test.

Backport of 34dcf51e06 from master.

comment:9 Changed 18 months ago by daphshez

Resolution: fixed
Status: closednew

I just ran the tests on the development code and had the test fail.

This is a Windows machine with default locale 'cp1255'.

I think the solution chosen to fix this bug relies on the default charset being UTF-8,
which is a problem, especially on Windows (where it never is).

ERROR: test_content_saving (file_storage.tests.ContentFileStorageTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\data\dev\django\django\tests\file_storage\tests.py", line 755, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("espa?ol"))
  File "c:\data\dev\django\django\django\core\files\storage.py", line 64, in save
    name = self._save(name, content)
  File "c:\data\dev\django\django\django\core\files\storage.py", line 257, in _save
    _file.write(chunk)
  File "c:\Python34\lib\encodings\cp1255.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf1' in position 4:
 character maps to <undefined>

As an extra suggestion: this test wouldn't fail if ñ is mapped in the default encoding of the machine.
I suggest adding a test for a file with Unicode characters that don't map in cp1251, e.g. Russian

Last edited 18 months ago by daphshez (previous) (diff)

comment:10 Changed 18 months ago by Tim Graham

Resolution: fixed
Status: newclosed

Please open a new ticket rather than reopening one that has been already released.

Note: See TracTickets for help on using tickets.
Back to Top