Opened 15 years ago
Closed 15 years ago
#13758 closed Bug (fixed)
MySQLdb utf8_bin and django causes UnicodeDecodeError
| Reported by: | Owned by: | nobody | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | utf8_binMySQLdb collation unicode bytestring |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | yes | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Issue:
I have a Model with a FileField. When I delete that instances of that model that have unicode characters in their filenames, I get a
'ascii' codec can't decode byte 0xc3 in position 18: ordinal not in range(128)
I finally traced the problem back to my database collation: utf8_bin. I chose utf8_bin so I could order the strings in a case-sensitive manner. FYI, MySQLdb does not return python unicode strings with a utf8_bin collation, it returns utf8 bytestrings. for a brief description of that issue see:
http://code.djangoproject.com/ticket/8340#comment:4
The traceback from my exception reveals the exception being thrown in
"django/db/models/fields/files.py" in get_prep_value (line 248).
FileField is a subclass of Field, but implements the same backend
MySQL type (varchar) as a CharField. However it seems that FileField
and CharField have completely different implementations of
get_prep_db.
Here is CharField's implementation:
def to_python(self, value):
if isinstance(value, basestring) or value is None:
return value
return smart_unicode(value)
def get_prep_value(self, value):
return self.to_python(value)
Here is Filefield's:
def get_prep_value(self, value):
"Returns field's value prepared for saving into a database."
# Need to convert File objects provided via a form to unicode for database insertion
if value is None:
return None
return unicode(value)
My experimentations revealed that if I replace the FileField
implementation of get_prep_value with CharField's implementation, the exception
goes away. The issue is that the default encoding is ascii and so
unicode() called on a utf8 byte str blows up. The CharField
implementation simply checks if the value is an instance of basestring
and quietly passes it through.
Change History (5)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
| Description: | modified (diff) |
|---|---|
| Has patch: | set |
| Needs tests: | set |
| Triage Stage: | Unreviewed → Accepted |
comment:3 by , 15 years ago
| Severity: | → Normal |
|---|---|
| Type: | → Bug |
comment:4 by , 15 years ago
| Easy pickings: | unset |
|---|---|
| Version: | 1.2 → SVN |
I can't reproduce this.
Using this model (in app 'backends'):
import tempfile
from django.core.files.storage import FileSystemStorage
temp_storage_location = tempfile.mkdtemp()
temp_storage = FileSystemStorage(location=temp_storage_location)
class Person(models.Model):
name = models.CharField(max_length=20)
avatar = models.FileField(storage=temp_storage, upload_to='tests', max_length=15)
and running this test:
from django.db import connections
cur = connections['default'].cursor()
cur.execute('ALTER TABLE backends_person MODIFY avatar VARCHAR(15) CHARACTER SET utf8 COLLATE utf8_bin;')
cur.close()
Person.objects.create(name='Django', avatar='汉语/漢語.png')
p = models.Person.objects.all()[0]
p.delete()
it works. I also tried uploading a UTF8 named file via the admin and deleting it, and again it worked.
Could you post a failing test, or more specific instructions for replicating?
comment:5 by , 15 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
I cannot reproduce this in 1.3. Marking this as resolved.
For clarity, I'll repost that code in code blocks:
Here is CharField's implementation:
def to_python(self, value): if isinstance(value, basestring) or value is None: return value return smart_unicode(value) def get_prep_value(self, value): return self.to_python(value)Here is Filefield's:
def get_prep_value(self, value): "Returns field's value prepared for saving into a database." # Need to convert File objects provided via a form to unicode for database insertion if value is None: return None return unicode(value)