Opened 12 years ago

Closed 11 years ago

#20112 closed Bug (wontfix)

UnicodeDecodeError with not UTF-8 charset database connection

Reported by: err Owned by: nobody
Component: Core (Other) Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Here is my database connection

'default': {                                                                  
    'NAME':     'mydb',                                                      
    'ENGINE':   'django.db.backends.mysql',                                                                                        
    'OPTIONS' : {"charset": "cp1251"},                                                                                
}
B2.objects.filter(name=u'hello')
[]

>>> B2.objects.filter(name=u'йц')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 77, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 92, in __len__
    self._result_cache.extend(self._iter)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 344, in _safe_iterator
    for item in iterator:
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 301, in iterator
    for row in compiler.results_iter():
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 775, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 840, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.7/dist-packages/django/db/backends/util.py", line 45, in execute
    sql = self.db.ops.last_executed_query(self.cursor, sql, params)
  File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 243, in last_executed_query
    return cursor._last_executed.decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 831: invalid continuation byte

In this ticket https://code.djangoproject.com/ticket/18461
return cursor._last_executed.decode('utf-8')
was added

But I guess we should do something like (in case of mysql):

         encoding = cursor.connection.character_set_name()
         return cursor._last_executed.decode(encoding)

Change History (12)

comment:1 by err, 12 years ago

Or just like in mysqldb code

encoding = cursor.connection.unicode_literal.charset

comment:2 by err, 12 years ago

https://github.com/django/django/pull/939
I've made a pull request. But I can't test this on oracle backend so this patch is for mysql/postgresql only

comment:3 by Claude Paroz, 12 years ago

Triage Stage: UnreviewedAccepted

comment:4 by Claude Paroz, 12 years ago

For PostgreSQL, I don't think this is an issue, as we are hard-coding UTF8 in init_connection_state.

For MySQL, I'd rather use the same charset we are passing when initing the connection (self.connection.get_connection_params()['charset']). This might be less dependent on MysqlDB implementation.

comment:5 by Karen Tracey, 12 years ago

I'm not sure we want to officially support setting the connection charset to anything other than utf-8 (or on MySQL the new fancy "real" utf-8 that actually supports more than 3-byte encodings, once we figure out how to do that). Ever since the unicode branch landed years ago Django has been by default setting the connection charset to utf-8...what's the use case for setting the connection charset to something more restrictive than utf-8?

comment:6 by Aymeric Augustin, 12 years ago

Isn't this necessary for databases not created by Django (unmanaged models)?

comment:7 by Karen Tracey, 12 years ago

No...doesn't matter what the database/table/column charset is, it's fine for data on the connection to flow as utf-8, since utf-8 can encode any values for any other supported charset.

comment:8 by Aymeric Augustin, 12 years ago

If the database handles the conversion, indeed, this isn't necessary (I wasn't sure).

comment:9 by Claude Paroz, 12 years ago

If we allow specifying the charset with the OPTIONS key (as get_connection_params currently does for MySQL), we should probably also use that charset to decode _last_executed.

Now I wouldn't be opposed to forcing utf-8 with MySQL like we do with PostgreSQL, but I cannot measure all possible consequences. Maybe the original poster might explain why he choose to customized the charset, instead of keeping the default?

comment:10 by Karen Tracey, 12 years ago

It would be good if we could get an explanation of why original poster wants cp1251 encoding on the connection. It was a surprise to me that this would actually work -- I thought we forced the connection charset to utf-8 ever since the merge of unicode branch. Rather it seems we just defaulted it to utf-8 and have been allowing override...I guess if this is the only place where that causes a problem, we could simply fix it. But I would not be surprised if there were other places where we assume utf-8 encoding on the connection. (Or I could be wrong about that, I was wrong about us forcing the connection charset to utf-8.)

comment:11 by err, 12 years ago

Actually I have some legacy django project which I'am trying to run under django 1.5
I thing that original author thought that if all tables are in cp1251 encoding so we should use same encoding on the connection.

I've tested utf8 connection instead of cp1251 and everything works fine. All encodings are handled correctly.
So I think we can close this ticket.

But I think that this behavior is little bit confusing so maybe you should force utf8 encoding on mysql too.

comment:12 by Tim Graham, 11 years ago

Resolution: wontfix
Status: newclosed

Closing given OP's comment and no clear consensus on what to do.

Note: See TracTickets for help on using tickets.
Back to Top