Opened 2 years ago

Closed 4 months ago

Last modified 6 weeks ago

#22377 closed Bug (needsinfo)

SQL Logging throws an exception when fields have utf-8 characters

Reported by: rolanvc@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: sql logging for utf-8
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by timgraham)

In django/db/backends/util.py, in CursorDebugWrapper.execute, the line:

 logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
                extra={'duration': duration, 'sql': sql, 'params': params}
            )

throws a "UnicodeDecodeError:ascii codec .... " when calling model.save() when some fields have utf-8/non-ascii values. My only workaround is to modify it to:

 logger.debug('(%.3f) %s; args=%s' % (duration, sql.decode('utf-8'), params),
                extra={'duration': duration, 'sql': sql, 'params': params}
            )

I'm hoping there's a better and more elegant way that I could use.

Thanks!
=)

Change History (18)

comment:1 Changed 2 years ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Could you please provide the full traceback? It would also be useful to know the values of duration, sql and params variables just before the crash.

comment:2 Changed 2 years ago by claudep

  • Resolution set to needsinfo
  • Status changed from new to closed
  • Type changed from Uncategorized to Bug

comment:3 Changed 21 months ago by kahnert

  • Resolution needsinfo deleted
  • Status changed from closed to new

Here is a traceback:

Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 35, in handle
    self.parse_file(fname)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 40, in parse_file
    self.parse_entry(cols)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 61, in parse_entry
    row_header, created = OIRowHeader.objects.get_or_create(**header_vals)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 154, in get_or_create
    return self.get_queryset().get_or_create(**kwargs)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create
    return self.get(**lookup), False
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get
    num = len(clone)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
    self._fetch_all()
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
    for row in compiler.results_iter():
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql
    cursor.execute(sql, params)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/backends/util.py", line 78, in execute
    logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 450: ordinal not in range(128)

I can't provide the data values, most likely some russian or chinese string, but I think the bug is obvious if you look into django.db.backends.util.CursorDebugWrapper:

  • Python 2
  • percent-formatting into bytestrings (str-Type)

-> UnicodeDecodeError for any non-ascii char

The solution should be as simple as: Use six string types

This is realy a hassle, since you have to work around debugging Code at the moment.

comment:4 Changed 21 months ago by claudep

It's not about percent-formatting into bytestrings, as the file imports from __future__ import unicode_literals. I suspect that content in header_vals dictionary has some non-ASCII non-Unicode content, which you should check and fix.

comment:5 Changed 21 months ago by kahnert

It's absolutely sure that all content in header_vals is correctly encoded, since it's read from utf-8 encoded CSV files. All content columns are safe-decodeded to unicode after reading. If I turn the DebugCursor off, all content gets into the DB and is correctly displayed when read from the DB.

But you're right, I could not create a test case in which

from __future__ import unicode_literals

logging.debug('(%.3f) %s; args=%s' % (duration, sql, params),
                extra={'duration': duration, 'sql': sql, 'params': params}
            )

throws an error. Even if I put wrong encoded strings into the params dict.

May it be possible that the actual DecodingError is produced by some logging handler or formatter?
Does the used db connector make a difference?

comment:6 Changed 21 months ago by claudep

If you can reproduce the traceback in comment:3, I'd suggest to catch the UnicodeDecodeError in a try/except, then pdb.set_trace() in the except code to look at various variables involved. This should give a hint about the problem.

comment:7 Changed 21 months ago by brian

I can verify this occurs with Django 1.7, and only with Python2. Python3 is fine.

import xyz.models as m
m.Class.objects.filter(attribute=u'’')

I put the following lines in above the line where it fails (django/db/backends/util.py logger.debug line):

print type(sql)
test = u'(%.3f) %s; args=%s' % (duration, sql, params)

(Yes, I know the u is redundant, wanted to make sure)

The first line prints <type 'str'>, the second line causes the exception.

So the problem appears to be that we are basically doing the following:

In [5]: u"%s" % '’'
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-87d36823dac9> in <module>()
----> 1 u"%s" % '’'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

Where we should be doing this:

In [4]: u"%s" % u'’'
Out[4]: u'\u2019'

According to ticket #18461, the result from last_executed_query should always be unicode, however, according to my tests, that doesn't appear to be happening.

Last edited 21 months ago by brian (previous) (diff)

comment:8 Changed 21 months ago by brian

Oh crap.

last_executed_query is defined in the db layer, and I am using python-mysql-connector, which defines it as:

def last_executed_query(self, cursor, sql, params):
   return cursor.statement

So I guess this needs to be reported to Oracle?

comment:10 Changed 21 months ago by brian

Unfortunately nobody else here has reported the db driver being used. If you encounter the problem and do not use python-mysql-connector, please report the driver you use here.

I think it is rather unlikely that this this problem is reproducible with any driver that comes supplied with Django.

comment:11 Changed 20 months ago by timgraham

  • Resolution set to needsinfo
  • Status changed from new to closed

comment:12 Changed 4 months ago by olivergraeser

I do see the same problem in django 1.9.1, and I use django.db.backends.mysql (which is a Django-supplied driver to my very limited understanding). I can provide additional information but would need some guidance.

Last edited 4 months ago by olivergraeser (previous) (diff)

comment:13 Changed 4 months ago by olivergraeser

  • Resolution needsinfo deleted
  • Status changed from closed to new
  • Version changed from 1.6 to 1.9

comment:14 Changed 4 months ago by timgraham

Can you provide a sample project or a test for Django's test suite that reproduces the issue?

comment:15 Changed 4 months ago by timgraham

  • Description modified (diff)
  • Resolution set to needsinfo
  • Status changed from new to closed

comment:16 Changed 6 weeks ago by marcinn

I can confirm similar issue when using raw queries directly via connection.cursor().execute(), when some parameters passed as dict contains unicode data. It happens for Django 1.8.6, Python 2.7.11, psycopg2 driver and only when DEBUG=True is set. Currently I've applied workaround by using native cursor (connection.cursor().cursor.execute().

Exception is raised when DebugCursor is logging a sql (logger.debug('(%.3f) [...]', at line 88 in django.db.backends.utils)

comment:17 Changed 6 weeks ago by marcinn

Also there is another issue related to this logger - when raw query has incorrect syntax, Django still tries to log the query (in the finally block). Then original exception (a sql error) is hidden, because of raising encoding/decoding error of the log message. Probably original error message lands somewhere in errors wrapper. I've spent hours on debugging unicode/deocde error instead of fixing query syntax. But when syntax was fixed, the error with logging still exists.

comment:18 Changed 6 weeks ago by claudep

Would it be possible to give us more details about the error, the query/parameters and the traceback? We have to be able to reproduce it in order to fix it.

Note: See TracTickets for help on using tickets.
Back to Top