Opened 5 years ago

Closed 4 years ago

#13758 closed Bug (fixed)

MySQLdb utf8_bin and django causes UnicodeDecodeError

Reported by: sam.vevang@… Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: utf8_binMySQLdb collation unicode bytestring
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX:

Description (last modified by russellm)

Issue:
I have a Model with a FileField. When I delete that instances of that model that have unicode characters in their filenames, I get a

'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in
range(128) 

I finally traced the problem back to my database collation: utf8_bin. I chose utf8_bin so I could order the strings in a case-sensitive manner. FYI, MySQLdb does not return python unicode strings with a utf8_bin collation, it returns utf8 bytestrings. for a brief description of that issue see:
http://code.djangoproject.com/ticket/8340#comment:4

The traceback from my exception reveals the exception being thrown in
"django/db/models/fields/files.py" in get_prep_value (line 248).
FileField is a subclass of Field, but implements the same backend
MySQL type (varchar) as a CharField. However it seems that FileField
and CharField have completely different implementations of
get_prep_db.

Here is CharField's implementation:

    def to_python(self, value):
        if isinstance(value, basestring) or value is None:
            return value
        return smart_unicode(value)

    def get_prep_value(self, value):
        return self.to_python(value)

Here is Filefield's:

    def get_prep_value(self, value):
        "Returns field's value prepared for saving into a database."
        # Need to convert File objects provided via a form to unicode for database insertion
        if value is None:
            return None
        return unicode(value)

My experimentations revealed that if I replace the FileField
implementation of get_prep_value with CharField's implementation, the exception
goes away. The issue is that the default encoding is ascii and so
unicode() called on a utf8 byte str blows up. The CharField
implementation simply checks if the value is an instance of basestring
and quietly passes it through.

Change History (5)

comment:1 Changed 5 years ago by sam.vevang@…

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

For clarity, I'll repost that code in code blocks:

Here is CharField's implementation:

    def to_python(self, value):
        if isinstance(value, basestring) or value is None:
            return value
        return smart_unicode(value)

    def get_prep_value(self, value):
        return self.to_python(value)

Here is Filefield's:

    def get_prep_value(self, value):
        "Returns field's value prepared for saving into a database."
        # Need to convert File objects provided via a form to unicode for database insertion
        if value is None:
            return None
        return unicode(value) 

comment:2 Changed 5 years ago by russellm

  • Description modified (diff)
  • Has patch set
  • Needs tests set
  • Triage Stage changed from Unreviewed to Accepted

comment:3 Changed 4 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:4 Changed 4 years ago by graham_king

  • Easy pickings unset
  • Version changed from 1.2 to SVN

I can't reproduce this.

Using this model (in app 'backends'):

import tempfile
from django.core.files.storage import FileSystemStorage

temp_storage_location = tempfile.mkdtemp()
temp_storage = FileSystemStorage(location=temp_storage_location)

class Person(models.Model):
    name = models.CharField(max_length=20)
    avatar = models.FileField(storage=temp_storage, upload_to='tests', max_length=15)

and running this test:

from django.db import connections

cur = connections['default'].cursor()
cur.execute('ALTER TABLE backends_person MODIFY avatar VARCHAR(15) CHARACTER SET utf8 COLLATE utf8_bin;')
cur.close()

Person.objects.create(name='Django', avatar='汉语/漢語.png')

p = models.Person.objects.all()[0]
p.delete()

it works. I also tried uploading a UTF8 named file via the admin and deleting it, and again it worked.

Could you post a failing test, or more specific instructions for replicating?

comment:5 Changed 4 years ago by sam.vevang@…

  • Resolution set to fixed
  • Status changed from new to closed

I cannot reproduce this in 1.3. Marking this as resolved.

Note: See TracTickets for help on using tickets.
Back to Top