Opened 5 years ago

Closed 5 years ago

#20112 closed Bug (wontfix)

UnicodeDecodeError with not UTF-8 charset database connection

Reported by: err Owned by: nobody
Component: Core (Other) Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


Here is my database connection

'default': {                                                                  
    'NAME':     'mydb',                                                      
    'ENGINE':   'django.db.backends.mysql',                                                                                        
    'OPTIONS' : {"charset": "cp1251"},                                                                                

>>> B2.objects.filter(name=u'йц')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/", line 77, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/", line 92, in __len__
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/", line 344, in _safe_iterator
    for item in iterator:
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/", line 301, in iterator
    for row in compiler.results_iter():
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/", line 775, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/", line 840, in execute_sql
    cursor.execute(sql, params)
  File "/usr/local/lib/python2.7/dist-packages/django/db/backends/", line 45, in execute
    sql = self.db.ops.last_executed_query(self.cursor, sql, params)
  File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/", line 243, in last_executed_query
    return cursor._last_executed.decode('utf-8')
  File "/usr/lib/python2.7/encodings/", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 831: invalid continuation byte

In this ticket
return cursor._last_executed.decode('utf-8')
was added

But I guess we should do something like (in case of mysql):

         encoding = cursor.connection.character_set_name()
         return cursor._last_executed.decode(encoding)

Change History (12)

comment:1 Changed 5 years ago by err

Or just like in mysqldb code

encoding = cursor.connection.unicode_literal.charset

comment:2 Changed 5 years ago by err
I've made a pull request. But I can't test this on oracle backend so this patch is for mysql/postgresql only

comment:3 Changed 5 years ago by Claude Paroz

Triage Stage: UnreviewedAccepted

comment:4 Changed 5 years ago by Claude Paroz

For PostgreSQL, I don't think this is an issue, as we are hard-coding UTF8 in init_connection_state.

For MySQL, I'd rather use the same charset we are passing when initing the connection (self.connection.get_connection_params()['charset']). This might be less dependent on MysqlDB implementation.

comment:5 Changed 5 years ago by Karen Tracey

I'm not sure we want to officially support setting the connection charset to anything other than utf-8 (or on MySQL the new fancy "real" utf-8 that actually supports more than 3-byte encodings, once we figure out how to do that). Ever since the unicode branch landed years ago Django has been by default setting the connection charset to utf-8...what's the use case for setting the connection charset to something more restrictive than utf-8?

comment:6 Changed 5 years ago by Aymeric Augustin

Isn't this necessary for databases not created by Django (unmanaged models)?

comment:7 Changed 5 years ago by Karen Tracey

No...doesn't matter what the database/table/column charset is, it's fine for data on the connection to flow as utf-8, since utf-8 can encode any values for any other supported charset.

comment:8 Changed 5 years ago by Aymeric Augustin

If the database handles the conversion, indeed, this isn't necessary (I wasn't sure).

comment:9 Changed 5 years ago by Claude Paroz

If we allow specifying the charset with the OPTIONS key (as get_connection_params currently does for MySQL), we should probably also use that charset to decode _last_executed.

Now I wouldn't be opposed to forcing utf-8 with MySQL like we do with PostgreSQL, but I cannot measure all possible consequences. Maybe the original poster might explain why he choose to customized the charset, instead of keeping the default?

comment:10 Changed 5 years ago by Karen Tracey

It would be good if we could get an explanation of why original poster wants cp1251 encoding on the connection. It was a surprise to me that this would actually work -- I thought we forced the connection charset to utf-8 ever since the merge of unicode branch. Rather it seems we just defaulted it to utf-8 and have been allowing override...I guess if this is the only place where that causes a problem, we could simply fix it. But I would not be surprised if there were other places where we assume utf-8 encoding on the connection. (Or I could be wrong about that, I was wrong about us forcing the connection charset to utf-8.)

comment:11 Changed 5 years ago by err

Actually I have some legacy django project which I'am trying to run under django 1.5
I thing that original author thought that if all tables are in cp1251 encoding so we should use same encoding on the connection.

I've tested utf8 connection instead of cp1251 and everything works fine. All encodings are handled correctly.
So I think we can close this ticket.

But I think that this behavior is little bit confusing so maybe you should force utf8 encoding on mysql too.

comment:12 Changed 5 years ago by Tim Graham

Resolution: wontfix
Status: newclosed

Closing given OP's comment and no clear consensus on what to do.

Note: See TracTickets for help on using tickets.
Back to Top