#22377 closed Bug (needsinfo)
SQL Logging throws an exception when fields have utf-8 characters
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | sql logging for utf-8 |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
In django/db/backends/util.py, in CursorDebugWrapper.execute, the line:
logger.debug('(%.3f) %s; args=%s' % (duration, sql, params), extra={'duration': duration, 'sql': sql, 'params': params} )
throws a "UnicodeDecodeError:ascii codec .... " when calling model.save() when some fields have utf-8/non-ascii values. My only workaround is to modify it to:
logger.debug('(%.3f) %s; args=%s' % (duration, sql.decode('utf-8'), params), extra={'duration': duration, 'sql': sql, 'params': params} )
I'm hoping there's a better and more elegant way that I could use.
Thanks!
=)
Change History (18)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
Type: | Uncategorized → Bug |
comment:3 by , 10 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Here is a traceback:
Traceback (most recent call last): File "./manage.py", line 10, in <module> execute_from_command_line(sys.argv) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line utility.execute() File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv self.execute(*args, **options.__dict__) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute output = self.handle(*args, **options) File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 35, in handle self.parse_file(fname) File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 40, in parse_file self.parse_entry(cols) File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 61, in parse_entry row_header, created = OIRowHeader.objects.get_or_create(**header_vals) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 154, in get_or_create return self.get_queryset().get_or_create(**kwargs) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create return self.get(**lookup), False File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get num = len(clone) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__ self._fetch_all() File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all self._result_cache = list(self.iterator()) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator for row in compiler.results_iter(): File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter for rows in self.execute_sql(MULTI): File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql cursor.execute(sql, params) File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/backends/util.py", line 78, in execute logger.debug('(%.3f) %s; args=%s' % (duration, sql, params), UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 450: ordinal not in range(128)
I can't provide the data values, most likely some russian or chinese string, but I think the bug is obvious if you look into django.db.backends.util.CursorDebugWrapper
:
- Python 2
- percent-formatting into bytestrings (
str
-Type)
-> UnicodeDecodeError for any non-ascii char
The solution should be as simple as: Use six
string types
This is realy a hassle, since you have to work around debugging Code at the moment.
comment:4 by , 10 years ago
It's not about percent-formatting into bytestrings, as the file imports from __future__ import unicode_literals
. I suspect that content in header_vals
dictionary has some non-ASCII non-Unicode content, which you should check and fix.
comment:5 by , 10 years ago
It's absolutely sure that all content in header_vals
is correctly encoded, since it's read from utf-8 encoded CSV files. All content columns are safe-decodeded to unicode after reading. If I turn the DebugCursor off, all content gets into the DB and is correctly displayed when read from the DB.
But you're right, I could not create a test case in which
from __future__ import unicode_literals logging.debug('(%.3f) %s; args=%s' % (duration, sql, params), extra={'duration': duration, 'sql': sql, 'params': params} )
throws an error. Even if I put wrong encoded strings into the params
dict.
May it be possible that the actual DecodingError is produced by some logging handler or formatter?
Does the used db connector make a difference?
comment:6 by , 10 years ago
If you can reproduce the traceback in comment:3, I'd suggest to catch the UnicodeDecodeError in a try/except, then pdb.set_trace()
in the except code to look at various variables involved. This should give a hint about the problem.
comment:7 by , 10 years ago
I can verify this occurs with Django 1.7, and only with Python2. Python3 is fine.
import xyz.models as m m.Class.objects.filter(attribute=u'’')
I put the following lines in above the line where it fails (django/db/backends/util.py logger.debug line):
print type(sql) test = u'(%.3f) %s; args=%s' % (duration, sql, params)
(Yes, I know the u is redundant, wanted to make sure)
The first line prints <type 'str'>, the second line causes the exception.
So the problem appears to be that we are basically doing the following:
In [5]: u"%s" % '’' --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-5-87d36823dac9> in <module>() ----> 1 u"%s" % '’' UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)
Where we should be doing this:
In [4]: u"%s" % u'’' Out[4]: u'\u2019'
According to ticket #18461, the result from last_executed_query should always be unicode, however, according to my tests, that doesn't appear to be happening.
comment:8 by , 10 years ago
Oh crap.
last_executed_query
is defined in the db layer, and I am using python-mysql-connector, which defines it as:
def last_executed_query(self, cursor, sql, params): return cursor.statement
So I guess this needs to be reported to Oracle?
comment:10 by , 10 years ago
Unfortunately nobody else here has reported the db driver being used. If you encounter the problem and do not use python-mysql-connector, please report the driver you use here.
I think it is rather unlikely that this this problem is reproducible with any driver that comes supplied with Django.
comment:11 by , 10 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:12 by , 9 years ago
I do see the same problem in django 1.9.1, and I use django.db.backends.mysql (which is a Django-supplied driver to my very limited understanding). I can provide additional information but would need some guidance.
comment:13 by , 9 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Version: | 1.6 → 1.9 |
comment:14 by , 9 years ago
Can you provide a sample project or a test for Django's test suite that reproduces the issue?
comment:15 by , 9 years ago
Description: | modified (diff) |
---|---|
Resolution: | → needsinfo |
Status: | new → closed |
comment:16 by , 9 years ago
I can confirm similar issue when using raw queries directly via connection.cursor().execute()
, when some parameters passed as dict
contains unicode data. It happens for Django 1.8.6
, Python 2.7.11
, psycopg2
driver and only when DEBUG=True
is set. Currently I've applied workaround by using native cursor (connection.cursor().cursor.execute()
.
Exception is raised when DebugCursor is logging a sql (logger.debug('(%.3f) [...]'
, at line 88 in django.db.backends.utils
)
comment:17 by , 9 years ago
Also there is another issue related to this logger - when raw query has incorrect syntax, Django still tries to log the query (in the finally
block). Then original exception (a sql error) is hidden, because of raising encoding/decoding error of the log message. Probably original error message lands somewhere in errors wrapper. I've spent hours on debugging unicode/deocde error instead of fixing query syntax. But when syntax was fixed, the error with logging still exists.
comment:18 by , 9 years ago
Would it be possible to give us more details about the error, the query/parameters and the traceback? We have to be able to reproduce it in order to fix it.
Could you please provide the full traceback? It would also be useful to know the values of duration, sql and params variables just before the crash.