Opened 3 years ago

Closed 11 months ago

Last modified 8 months ago

#22377 closed Bug (needsinfo)

SQL Logging throws an exception when fields have utf-8 characters

Reported by: rolanvc@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: sql logging for utf-8
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Tim Graham)

In django/db/backends/util.py, in CursorDebugWrapper.execute, the line:

 logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
                extra={'duration': duration, 'sql': sql, 'params': params}
            )

throws a "UnicodeDecodeError:ascii codec .... " when calling model.save() when some fields have utf-8/non-ascii values. My only workaround is to modify it to:

 logger.debug('(%.3f) %s; args=%s' % (duration, sql.decode('utf-8'), params),
                extra={'duration': duration, 'sql': sql, 'params': params}
            )

I'm hoping there's a better and more elegant way that I could use.

Thanks!
=)

Change History (18)

comment:1 Changed 3 years ago by Claude Paroz

Could you please provide the full traceback? It would also be useful to know the values of duration, sql and params variables just before the crash.

comment:2 Changed 3 years ago by Claude Paroz

Resolution: needsinfo
Status: newclosed
Type: UncategorizedBug

comment:3 Changed 2 years ago by kahnert

Resolution: needsinfo
Status: closednew

Here is a traceback:

Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 35, in handle
    self.parse_file(fname)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 40, in parse_file
    self.parse_entry(cols)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 61, in parse_entry
    row_header, created = OIRowHeader.objects.get_or_create(**header_vals)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 154, in get_or_create
    return self.get_queryset().get_or_create(**kwargs)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create
    return self.get(**lookup), False
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get
    num = len(clone)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
    self._fetch_all()
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
    for row in compiler.results_iter():
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql
    cursor.execute(sql, params)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/backends/util.py", line 78, in execute
    logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 450: ordinal not in range(128)

I can't provide the data values, most likely some russian or chinese string, but I think the bug is obvious if you look into django.db.backends.util.CursorDebugWrapper:

  • Python 2
  • percent-formatting into bytestrings (str-Type)

-> UnicodeDecodeError for any non-ascii char

The solution should be as simple as: Use six string types

This is realy a hassle, since you have to work around debugging Code at the moment.

comment:4 Changed 2 years ago by Claude Paroz

It's not about percent-formatting into bytestrings, as the file imports from __future__ import unicode_literals. I suspect that content in header_vals dictionary has some non-ASCII non-Unicode content, which you should check and fix.

comment:5 Changed 2 years ago by kahnert

It's absolutely sure that all content in header_vals is correctly encoded, since it's read from utf-8 encoded CSV files. All content columns are safe-decodeded to unicode after reading. If I turn the DebugCursor off, all content gets into the DB and is correctly displayed when read from the DB.

But you're right, I could not create a test case in which

from __future__ import unicode_literals

logging.debug('(%.3f) %s; args=%s' % (duration, sql, params),
                extra={'duration': duration, 'sql': sql, 'params': params}
            )

throws an error. Even if I put wrong encoded strings into the params dict.

May it be possible that the actual DecodingError is produced by some logging handler or formatter?
Does the used db connector make a difference?

comment:6 Changed 2 years ago by Claude Paroz

If you can reproduce the traceback in comment:3, I'd suggest to catch the UnicodeDecodeError in a try/except, then pdb.set_trace() in the except code to look at various variables involved. This should give a hint about the problem.

comment:7 Changed 2 years ago by Brian May

I can verify this occurs with Django 1.7, and only with Python2. Python3 is fine.

import xyz.models as m
m.Class.objects.filter(attribute=u'’')

I put the following lines in above the line where it fails (django/db/backends/util.py logger.debug line):

print type(sql)
test = u'(%.3f) %s; args=%s' % (duration, sql, params)

(Yes, I know the u is redundant, wanted to make sure)

The first line prints <type 'str'>, the second line causes the exception.

So the problem appears to be that we are basically doing the following:

In [5]: u"%s" % '’'
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-87d36823dac9> in <module>()
----> 1 u"%s" % '’'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

Where we should be doing this:

In [4]: u"%s" % u'’'
Out[4]: u'\u2019'

According to ticket #18461, the result from last_executed_query should always be unicode, however, according to my tests, that doesn't appear to be happening.

Last edited 2 years ago by Brian May (previous) (diff)

comment:8 Changed 2 years ago by Brian May

Oh crap.

last_executed_query is defined in the db layer, and I am using python-mysql-connector, which defines it as:

def last_executed_query(self, cursor, sql, params):
   return cursor.statement

So I guess this needs to be reported to Oracle?

comment:9 Changed 2 years ago by Brian May

comment:10 Changed 2 years ago by Brian May

Unfortunately nobody else here has reported the db driver being used. If you encounter the problem and do not use python-mysql-connector, please report the driver you use here.

I think it is rather unlikely that this this problem is reproducible with any driver that comes supplied with Django.

comment:11 Changed 2 years ago by Tim Graham

Resolution: needsinfo
Status: newclosed

comment:12 Changed 11 months ago by olivergraeser

I do see the same problem in django 1.9.1, and I use django.db.backends.mysql (which is a Django-supplied driver to my very limited understanding). I can provide additional information but would need some guidance.

Last edited 11 months ago by olivergraeser (previous) (diff)

comment:13 Changed 11 months ago by olivergraeser

Resolution: needsinfo
Status: closednew
Version: 1.61.9

comment:14 Changed 11 months ago by Tim Graham

Can you provide a sample project or a test for Django's test suite that reproduces the issue?

comment:15 Changed 11 months ago by Tim Graham

Description: modified (diff)
Resolution: needsinfo
Status: newclosed

comment:16 Changed 8 months ago by Marcin Nowak

I can confirm similar issue when using raw queries directly via connection.cursor().execute(), when some parameters passed as dict contains unicode data. It happens for Django 1.8.6, Python 2.7.11, psycopg2 driver and only when DEBUG=True is set. Currently I've applied workaround by using native cursor (connection.cursor().cursor.execute().

Exception is raised when DebugCursor is logging a sql (logger.debug('(%.3f) [...]', at line 88 in django.db.backends.utils)

comment:17 Changed 8 months ago by Marcin Nowak

Also there is another issue related to this logger - when raw query has incorrect syntax, Django still tries to log the query (in the finally block). Then original exception (a sql error) is hidden, because of raising encoding/decoding error of the log message. Probably original error message lands somewhere in errors wrapper. I've spent hours on debugging unicode/deocde error instead of fixing query syntax. But when syntax was fixed, the error with logging still exists.

comment:18 Changed 8 months ago by Claude Paroz

Would it be possible to give us more details about the error, the query/parameters and the traceback? We have to be able to reproduce it in order to fix it.

Note: See TracTickets for help on using tickets.
Back to Top