#22377 closed Bug (needsinfo)
SQL Logging throws an exception when fields have utf-8 characters — at Version 15
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | sql logging for utf-8 |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
In django/db/backends/util.py, in CursorDebugWrapper.execute, the line:
logger.debug('(%.3f) %s; args=%s' % (duration, sql, params), extra={'duration': duration, 'sql': sql, 'params': params} )
throws a "UnicodeDecodeError:ascii codec .... " when calling model.save() when some fields have utf-8/non-ascii values. My only workaround is to modify it to:
logger.debug('(%.3f) %s; args=%s' % (duration, sql.decode('utf-8'), params), extra={'duration': duration, 'sql': sql, 'params': params} )
I'm hoping there's a better and more elegant way that I could use.
Thanks!
=)
Change History (15)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Type: | Uncategorized → Bug |
comment:3 by , 10 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Here is a traceback:
Traceback (most recent call last): File "./manage.py", line 10, in <module> execute_from_command_line(sys.argv) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line utility.execute() File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv self.execute(*args, **options.__dict__) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute output = self.handle(*args, **options) File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 35, in handle self.parse_file(fname) File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 40, in parse_file self.parse_entry(cols) File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 61, in parse_entry row_header, created = OIRowHeader.objects.get_or_create(**header_vals) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 154, in get_or_create return self.get_queryset().get_or_create(**kwargs) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create return self.get(**lookup), False File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get num = len(clone) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__ self._fetch_all() File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all self._result_cache = list(self.iterator()) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator for row in compiler.results_iter(): File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter for rows in self.execute_sql(MULTI): File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql cursor.execute(sql, params) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/backends/util.py", line 78, in execute logger.debug('(%.3f) %s; args=%s' % (duration, sql, params), UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 450: ordinal not in range(128)
I can't provide the data values, most likely some russian or chinese string, but I think the bug is obvious if you look into django.db.backends.util.CursorDebugWrapper
:
- Python 2
- percent-formatting into bytestrings (
str
-Type)
-> UnicodeDecodeError for any non-ascii char
The solution should be as simple as: Use six
string types
This is realy a hassle, since you have to work around debugging Code at the moment.
comment:4 by , 10 years ago
It's not about percent-formatting into bytestrings, as the file imports from __future__ import unicode_literals
. I suspect that content in header_vals
dictionary has some non-ASCII non-Unicode content, which you should check and fix.
comment:5 by , 10 years ago
It's absolutely sure that all content in header_vals
is correctly encoded, since it's read from utf-8 encoded CSV files. All content columns are safe-decodeded to unicode after reading. If I turn the DebugCursor off, all content gets into the DB and is correctly displayed when read from the DB.
But you're right, I could not create a test case in which
from __future__ import unicode_literals logging.debug('(%.3f) %s; args=%s' % (duration, sql, params), extra={'duration': duration, 'sql': sql, 'params': params} )
throws an error. Even if I put wrong encoded strings into the params
dict.
May it be possible that the actual DecodingError is produced by some logging handler or formatter?
Does the used db connector make a difference?
comment:6 by , 10 years ago
If you can reproduce the traceback in comment:3, I'd suggest to catch the UnicodeDecodeError in a try/except, then pdb.set_trace()
in the except code to look at various variables involved. This should give a hint about the problem.
comment:7 by , 10 years ago
I can verify this occurs with Django 1.7, and only with Python2. Python3 is fine.
import xyz.models as m m.Class.objects.filter(attribute=u'’')
I put the following lines in above the line where it fails (django/db/backends/util.py logger.debug line):
print type(sql) test = u'(%.3f) %s; args=%s' % (duration, sql, params)
(Yes, I know the u is redundant, wanted to make sure)
The first line prints <type 'str'>, the second line causes the exception.
So the problem appears to be that we are basically doing the following:
In [5]: u"%s" % '’' --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-5-87d36823dac9> in <module>() ----> 1 u"%s" % '’' UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
Where we should be doing this:
In [4]: u"%s" % u'’' Out[4]: u'\u2019'
According to ticket #18461, the result from last_executed_query
should always be unicode, however, according to my tests, that doesn't appear to be happening.
comment:8 by , 10 years ago
Oh crap.
last_executed_query
is defined in the db layer, and I am using python-mysql-connector, which defines it as:
def last_executed_query(self, cursor, sql, params): return cursor.statement
So I guess this needs to be reported to Oracle?
comment:10 by , 10 years ago
Unfortunately nobody else here has reported the db driver being used. If you encounter the problem and do not use python-mysql-connector, please report the driver you use here.
I think it is rather unlikely that this this problem is reproducible with any driver that comes supplied with Django.
comment:11 by , 10 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:12 by , 9 years ago
I do see the same problem in django 1.9.1, and I use django.db.backends.mysql (which is a Django-supplied driver to my very limited understanding). I can provide additional information but would need some guidance.
comment:13 by , 9 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Version: | 1.6 → 1.9 |
comment:14 by , 9 years ago
Can you provide a sample project or a test for Django's test suite that reproduces the issue?
comment:15 by , 9 years ago
Description: | modified (diff) |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
Could you please provide the full traceback? It would also be useful to know the values of duration, sql and params variables just before the crash.