#22377 closed Bug (needsinfo)
SQL Logging throws an exception when fields have utf-8 characters — at Version 15
| Reported by: | Owned by: | nobody | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.9 |
| Severity: | Normal | Keywords: | sql logging for utf-8 |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
In django/db/backends/util.py, in CursorDebugWrapper.execute, the line:
logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
extra={'duration': duration, 'sql': sql, 'params': params}
)
throws a "UnicodeDecodeError:ascii codec .... " when calling model.save() when some fields have utf-8/non-ascii values. My only workaround is to modify it to:
logger.debug('(%.3f) %s; args=%s' % (duration, sql.decode('utf-8'), params),
extra={'duration': duration, 'sql': sql, 'params': params}
)
I'm hoping there's a better and more elegant way that I could use.
Thanks!
=)
Change History (15)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
| Resolution: | → needsinfo |
|---|---|
| Status: | new → closed |
| Type: | Uncategorized → Bug |
comment:3 by , 11 years ago
| Resolution: | needsinfo |
|---|---|
| Status: | closed → new |
Here is a traceback:
Traceback (most recent call last):
File "./manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
utility.execute()
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv
self.execute(*args, **options.__dict__)
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
output = self.handle(*args, **options)
File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 35, in handle
self.parse_file(fname)
File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 40, in parse_file
self.parse_entry(cols)
File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 61, in parse_entry
row_header, created = OIRowHeader.objects.get_or_create(**header_vals)
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 154, in get_or_create
return self.get_queryset().get_or_create(**kwargs)
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create
return self.get(**lookup), False
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get
num = len(clone)
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
self._fetch_all()
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
self._result_cache = list(self.iterator())
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
for row in compiler.results_iter():
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
for rows in self.execute_sql(MULTI):
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql
cursor.execute(sql, params)
File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/backends/util.py", line 78, in execute
logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 450: ordinal not in range(128)
I can't provide the data values, most likely some russian or chinese string, but I think the bug is obvious if you look into django.db.backends.util.CursorDebugWrapper:
- Python 2
- percent-formatting into bytestrings (
str-Type)
-> UnicodeDecodeError for any non-ascii char
The solution should be as simple as: Use six string types
This is realy a hassle, since you have to work around debugging Code at the moment.
comment:4 by , 11 years ago
It's not about percent-formatting into bytestrings, as the file imports from __future__ import unicode_literals. I suspect that content in header_vals dictionary has some non-ASCII non-Unicode content, which you should check and fix.
comment:5 by , 11 years ago
It's absolutely sure that all content in header_vals is correctly encoded, since it's read from utf-8 encoded CSV files. All content columns are safe-decodeded to unicode after reading. If I turn the DebugCursor off, all content gets into the DB and is correctly displayed when read from the DB.
But you're right, I could not create a test case in which
from __future__ import unicode_literals
logging.debug('(%.3f) %s; args=%s' % (duration, sql, params),
extra={'duration': duration, 'sql': sql, 'params': params}
)
throws an error. Even if I put wrong encoded strings into the params dict.
May it be possible that the actual DecodingError is produced by some logging handler or formatter?
Does the used db connector make a difference?
comment:6 by , 11 years ago
If you can reproduce the traceback in comment:3, I'd suggest to catch the UnicodeDecodeError in a try/except, then pdb.set_trace() in the except code to look at various variables involved. This should give a hint about the problem.
comment:7 by , 11 years ago
I can verify this occurs with Django 1.7, and only with Python2. Python3 is fine.
import xyz.models as m m.Class.objects.filter(attribute=u'’')
I put the following lines in above the line where it fails (django/db/backends/util.py logger.debug line):
print type(sql) test = u'(%.3f) %s; args=%s' % (duration, sql, params)
(Yes, I know the u is redundant, wanted to make sure)
The first line prints <type 'str'>, the second line causes the exception.
So the problem appears to be that we are basically doing the following:
In [5]: u"%s" % '’' --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-5-87d36823dac9> in <module>() ----> 1 u"%s" % '’' UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
Where we should be doing this:
In [4]: u"%s" % u'’' Out[4]: u'\u2019'
According to ticket #18461, the result from last_executed_query should always be unicode, however, according to my tests, that doesn't appear to be happening.
comment:8 by , 11 years ago
Oh crap.
last_executed_query is defined in the db layer, and I am using python-mysql-connector, which defines it as:
def last_executed_query(self, cursor, sql, params): return cursor.statement
So I guess this needs to be reported to Oracle?
comment:10 by , 11 years ago
Unfortunately nobody else here has reported the db driver being used. If you encounter the problem and do not use python-mysql-connector, please report the driver you use here.
I think it is rather unlikely that this this problem is reproducible with any driver that comes supplied with Django.
comment:11 by , 11 years ago
| Resolution: | → needsinfo |
|---|---|
| Status: | new → closed |
comment:12 by , 10 years ago
I do see the same problem in django 1.9.1, and I use django.db.backends.mysql (which is a Django-supplied driver to my very limited understanding). I can provide additional information but would need some guidance.
comment:13 by , 10 years ago
| Resolution: | needsinfo |
|---|---|
| Status: | closed → new |
| Version: | 1.6 → 1.9 |
comment:14 by , 10 years ago
Can you provide a sample project or a test for Django's test suite that reproduces the issue?
comment:15 by , 10 years ago
| Description: | modified (diff) |
|---|---|
| Resolution: | → needsinfo |
| Status: | new → closed |
Could you please provide the full traceback? It would also be useful to know the values of duration, sql and params variables just before the crash.