Opened 12 years ago
Closed 11 years ago
#20112 closed Bug (wontfix)
UnicodeDecodeError with not UTF-8 charset database connection
Reported by: | err | Owned by: | nobody |
---|---|---|---|
Component: | Core (Other) | Version: | 1.5 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Here is my database connection
'default': { 'NAME': 'mydb', 'ENGINE': 'django.db.backends.mysql', 'OPTIONS' : {"charset": "cp1251"}, }
B2.objects.filter(name=u'hello') [] >>> B2.objects.filter(name=u'йц') Traceback (most recent call last): File "<console>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 77, in __repr__ data = list(self[:REPR_OUTPUT_SIZE + 1]) File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 92, in __len__ self._result_cache.extend(self._iter) File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 344, in _safe_iterator for item in iterator: File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 301, in iterator for row in compiler.results_iter(): File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 775, in results_iter for rows in self.execute_sql(MULTI): File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/compiler.py", line 840, in execute_sql cursor.execute(sql, params) File "/usr/local/lib/python2.7/dist-packages/django/db/backends/util.py", line 45, in execute sql = self.db.ops.last_executed_query(self.cursor, sql, params) File "/usr/local/lib/python2.7/dist-packages/django/db/backends/mysql/base.py", line 243, in last_executed_query return cursor._last_executed.decode('utf-8') File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 831: invalid continuation byte
In this ticket https://code.djangoproject.com/ticket/18461
return cursor._last_executed.decode('utf-8')
was added
But I guess we should do something like (in case of mysql):
encoding = cursor.connection.character_set_name() return cursor._last_executed.decode(encoding)
Change History (12)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
https://github.com/django/django/pull/939
I've made a pull request. But I can't test this on oracle backend so this patch is for mysql/postgresql only
comment:3 by , 12 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:4 by , 12 years ago
For PostgreSQL, I don't think this is an issue, as we are hard-coding UTF8
in init_connection_state
.
For MySQL, I'd rather use the same charset we are passing when initing the connection (self.connection.get_connection_params()['charset']
). This might be less dependent on MysqlDB implementation.
comment:5 by , 12 years ago
I'm not sure we want to officially support setting the connection charset to anything other than utf-8 (or on MySQL the new fancy "real" utf-8 that actually supports more than 3-byte encodings, once we figure out how to do that). Ever since the unicode branch landed years ago Django has been by default setting the connection charset to utf-8...what's the use case for setting the connection charset to something more restrictive than utf-8?
comment:6 by , 12 years ago
Isn't this necessary for databases not created by Django (unmanaged models)?
comment:7 by , 12 years ago
No...doesn't matter what the database/table/column charset is, it's fine for data on the connection to flow as utf-8, since utf-8 can encode any values for any other supported charset.
comment:8 by , 12 years ago
If the database handles the conversion, indeed, this isn't necessary (I wasn't sure).
comment:9 by , 12 years ago
If we allow specifying the charset with the OPTIONS key (as get_connection_params
currently does for MySQL), we should probably also use that charset to decode _last_executed
.
Now I wouldn't be opposed to forcing utf-8
with MySQL like we do with PostgreSQL, but I cannot measure all possible consequences. Maybe the original poster might explain why he choose to customized the charset, instead of keeping the default?
comment:10 by , 12 years ago
It would be good if we could get an explanation of why original poster wants cp1251 encoding on the connection. It was a surprise to me that this would actually work -- I thought we forced the connection charset to utf-8 ever since the merge of unicode branch. Rather it seems we just defaulted it to utf-8 and have been allowing override...I guess if this is the only place where that causes a problem, we could simply fix it. But I would not be surprised if there were other places where we assume utf-8 encoding on the connection. (Or I could be wrong about that, I was wrong about us forcing the connection charset to utf-8.)
comment:11 by , 12 years ago
Actually I have some legacy django project which I'am trying to run under django 1.5
I thing that original author thought that if all tables are in cp1251 encoding so we should use same encoding on the connection.
I've tested utf8 connection instead of cp1251 and everything works fine. All encodings are handled correctly.
So I think we can close this ticket.
But I think that this behavior is little bit confusing so maybe you should force utf8 encoding on mysql too.
comment:12 by , 11 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Closing given OP's comment and no clear consensus on what to do.
Or just like in mysqldb code