Opened 3 years ago

Closed 3 months ago

#19367 closed Bug (fixed)

ContentFile fails to save with filesystem storage when initialized with unicode string

Reported by: void Owned by: nobody
Component: File uploads/storage Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Attached test fails for me on python 2.7 with the following traceback:

Traceback (most recent call last):
  File "/Users/voidus/Documents/workspace/django-trunk/tests/regressiontests/file_storage/tests.py", line 590, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("español"))
  File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 48, in save
    name = self._save(name, content)
  File "/Users/voidus/Documents/workspace/django-trunk/django/core/files/storage.py", line 206, in _save
    _file.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 4: ordinal not in range(128)

Attachments (3)

ticket19367.txt (1.3 KB) - added by void 3 years ago.
Failing test
19367-2a.diff (2.7 KB) - added by claudep 3 years ago.
Approach 1: Different behaviour depending on python 2 or 3
19367-2b.diff (2.7 KB) - added by claudep 3 years ago.
Approach 2: Always force content to bytes

Download all attachments as: .zip

Change History (13)

Changed 3 years ago by void

Failing test

comment:1 Changed 3 years ago by aaugustin

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

IMHO it's an error to initialize a file-like object with unicode content; files are bytes.

That said, if Django currently supports it, we shouldn't remove this possibility outright (ie. without deprecation).

Is this a regression from Django 1.4?

comment:2 Changed 3 years ago by claudep

I was the bad guy who added support for unicode argument to ContentFile (in #11739).

After re-reading the ticket, I tend to think now that even if we should still be liberal in what we accept as parameter, we should force conversion to bytes. The downside is that the write/read cycle is not symetrical (write unicode (or bytes), read bytes), but as Aymeric said, files are bytes. I just quickly tested a patch with this idea, and except of the specific "symetrical" test committed in #11739, the test suite passes.

comment:3 Changed 3 years ago by void

Just a note: files are bytes, unless there're opened in python3 with 't' option

Changed 3 years ago by claudep

Approach 1: Different behaviour depending on python 2 or 3

Changed 3 years ago by claudep

Approach 2: Always force content to bytes

comment:4 Changed 3 years ago by claudep

  • Has patch set
  • Triage Stage changed from Unreviewed to Accepted

I'm still undecided about the best way to fix this. On one side, it would be easier and more consistent to always convert the content to bytes (Approach 2), but on the other side, opening text files on Python 3 returns unicode content, so we might also reflect that difference in our implementation (Approach 1).

comment:5 Changed 3 years ago by void

I'd prefer approach 1, because it is more consistent in my eyes with python2 and python3 files behaviour - and this behaviour is, in fact, different for python2 and python3.

comment:6 Changed 3 years ago by aaugustin

  • Triage Stage changed from Accepted to Ready for checkin

Patch 2a looks fine.

comment:7 Changed 3 years ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 34dcf51e06fc57fc5462bd40f6a4c8ee949521da:

Fixed #19367 -- Fixed saving ContentFile in filesystem storage

This was not working properly when ContentFile was initialized with
an unicode string.
Thanks Alexey Boriskin for the report and the test.

comment:8 Changed 3 years ago by Claude Paroz <claude@…>

In e9301ae451f84b06687d27af0a43bcb986054f63:

[1.5.x] Fixed #19367 -- Fixed saving ContentFile in filesystem storage

This was not working properly when ContentFile was initialized with
an unicode string.
Thanks Alexey Boriskin for the report and the test.

Backport of 34dcf51e06 from master.

comment:9 Changed 3 months ago by daphshez

  • Resolution fixed deleted
  • Status changed from closed to new

I just ran the tests on the development code and had the test fail.

This is a Windows machine with default locale 'cp1255'.

I think the solution chosen to fix this bug relies on the default charset being UTF-8,
which is a problem, especially on Windows (where it never is).

ERROR: test_content_saving (file_storage.tests.ContentFileStorageTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\data\dev\django\django\tests\file_storage\tests.py", line 755, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("espa?ol"))
  File "c:\data\dev\django\django\django\core\files\storage.py", line 64, in save
    name = self._save(name, content)
  File "c:\data\dev\django\django\django\core\files\storage.py", line 257, in _save
    _file.write(chunk)
  File "c:\Python34\lib\encodings\cp1255.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf1' in position 4:
 character maps to <undefined>

As an extra suggestion: this test wouldn't fail if ñ is mapped in the default encoding of the machine.
I suggest adding a test for a file with Unicode characters that don't map in cp1251, e.g. Russian

Last edited 3 months ago by daphshez (previous) (diff)

comment:10 Changed 3 months ago by timgraham

  • Resolution set to fixed
  • Status changed from new to closed

Please open a new ticket rather than reopening one that has been already released.

Note: See TracTickets for help on using tickets.
Back to Top