Opened 15 months ago

Closed 9 months ago

#22377 closed Bug (needsinfo)

SQL Logging throws an exception when fields have utf-8 characters

Reported by: rolanvc@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords: sql logging for utf-8
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In django/db/backends/util.py, in CursorDebugWrapper.execute, the line:

logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),

extra={'duration': duration, 'sql': sql, 'params': params}

)

throws a "UnicodeDecodeError:ascii codec .... " when calling model.save() when some fields have utf-8/non-ascii values. My only workaround is to modify it to:

logger.debug('(%.3f) %s; args=%s' % (duration, sql.decode('utf-8'), params),

extra={'duration': duration, 'sql': sql, 'params': params}

)

I'm hoping there's a better and more elegant way that I could use.

Thanks!
=)

Change History (11)

comment:1 Changed 15 months ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Could you please provide the full traceback? It would also be useful to know the values of duration, sql and params variables just before the crash.

comment:2 Changed 15 months ago by claudep

  • Resolution set to needsinfo
  • Status changed from new to closed
  • Type changed from Uncategorized to Bug

comment:3 Changed 10 months ago by kahnert

  • Resolution needsinfo deleted
  • Status changed from closed to new

Here is a traceback:

Traceback (most recent call last):
  File "./manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/core/management/base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 35, in handle
    self.parse_file(fname)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 40, in parse_file
    self.parse_entry(cols)
  File "/home/kahnert/pycharm/schedules/schedules/data/management/commands/import_data.py", line 61, in parse_entry
    row_header, created = OIRowHeader.objects.get_or_create(**header_vals)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/manager.py", line 154, in get_or_create
    return self.get_queryset().get_or_create(**kwargs)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 376, in get_or_create
    return self.get(**lookup), False
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 304, in get
    num = len(clone)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 77, in __len__
    self._fetch_all()
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 857, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/query.py", line 220, in iterator
    for row in compiler.results_iter():
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 713, in results_iter
    for rows in self.execute_sql(MULTI):
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 786, in execute_sql
    cursor.execute(sql, params)
  File "/mnt/schedenv/local/lib/python2.7/site-packages/django/db/backends/util.py", line 78, in execute
    logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 450: ordinal not in range(128)

I can't provide the data values, most likely some russian or chinese string, but I think the bug is obvious if you look into django.db.backends.util.CursorDebugWrapper:

  • Python 2
  • percent-formatting into bytestrings (str-Type)

-> UnicodeDecodeError for any non-ascii char

The solution should be as simple as: Use six string types

This is realy a hassle, since you have to work around debugging Code at the moment.

comment:4 Changed 10 months ago by claudep

It's not about percent-formatting into bytestrings, as the file imports from __future__ import unicode_literals. I suspect that content in header_vals dictionary has some non-ASCII non-Unicode content, which you should check and fix.

comment:5 Changed 10 months ago by kahnert

It's absolutely sure that all content in header_vals is correctly encoded, since it's read from utf-8 encoded CSV files. All content columns are safe-decodeded to unicode after reading. If I turn the DebugCursor off, all content gets into the DB and is correctly displayed when read from the DB.

But you're right, I could not create a test case in which

from __future__ import unicode_literals

logging.debug('(%.3f) %s; args=%s' % (duration, sql, params),
                extra={'duration': duration, 'sql': sql, 'params': params}
            )

throws an error. Even if I put wrong encoded strings into the params dict.

May it be possible that the actual DecodingError is produced by some logging handler or formatter?
Does the used db connector make a difference?

comment:6 Changed 10 months ago by claudep

If you can reproduce the traceback in comment:3, I'd suggest to catch the UnicodeDecodeError in a try/except, then pdb.set_trace() in the except code to look at various variables involved. This should give a hint about the problem.

comment:7 Changed 10 months ago by brian

I can verify this occurs with Django 1.7, and only with Python2. Python3 is fine.

import xyz.models as m
m.Class.objects.filter(attribute=u'’')

I put the following lines in above the line where it fails (django/db/backends/util.py logger.debug line):

print type(sql)
test = u'(%.3f) %s; args=%s' % (duration, sql, params)

(Yes, I know the u is redundant, wanted to make sure)

The first line prints <type 'str'>, the second line causes the exception.

So the problem appears to be that we are basically doing the following:

In [5]: u"%s" % '’'
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-5-87d36823dac9> in <module>()
----> 1 u"%s" % '’'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

Where we should be doing this:

In [4]: u"%s" % u'’'
Out[4]: u'\u2019'

According to ticket #18461, the result from last_executed_query should always be unicode, however, according to my tests, that doesn't appear to be happening.

Last edited 10 months ago by brian (previous) (diff)

comment:8 Changed 10 months ago by brian

Oh crap.

last_executed_query is defined in the db layer, and I am using python-mysql-connector, which defines it as:

def last_executed_query(self, cursor, sql, params):
   return cursor.statement

So I guess this needs to be reported to Oracle?

comment:10 Changed 10 months ago by brian

Unfortunately nobody else here has reported the db driver being used. If you encounter the problem and do not use python-mysql-connector, please report the driver you use here.

I think it is rather unlikely that this this problem is reproducible with any driver that comes supplied with Django.

comment:11 Changed 9 months ago by timgraham

  • Resolution set to needsinfo
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.
Back to Top