Opened 11 years ago

Closed 11 years ago

Last modified 10 years ago

#19954 closed Bug (fixed)

Storing of Binary fields leads to Exceptions

Reported by: marcel.ryser.ch@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I am using a BinaryField on a table to store a binary value

Class for this BinaryField:

class BinaryField(models.TextField):
    def get_prep_value(self, value):
        return value;

This worked with 1.4 release pretty good. With 1.5 i got this exception

  File "C:\Program Files\Python27\lib\site-packages\django\core\handlers\base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "C:\dev\workspace\myproject\Project\src\package\adminviews.py", line 16, in changevalue
    movie.save()
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\base.py", line 546, in save
    force_update=force_update, update_fields=update_fields)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\base.py", line 626, in save_base
    rows = manager.using(using).filter(pk=pk_val)._update(values)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\query.py", line 603, in _update
    return query.get_compiler(self.db).execute_sql(None)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 1014, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 840, in execute_sql
    cursor.execute(sql, params)
  File "C:\Program Files\Python27\lib\site-packages\django\db\backends\util.py", line 45, in execute
    sql = self.db.ops.last_executed_query(self.cursor, sql, params)
  File "C:\Program Files\Python27\lib\site-packages\django\db\backends\mysql\base.py", line 243, in last_executed_query
    return cursor._last_executed.decode('utf-8')
  File "C:\Program Files\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 708: invalid start byte

This is right, one cannot decode a binary value to utf8 characters. So I saw Ticket #6416 which describes pretty good my problem. I patched the mysql adapter with the code from #6416 and now it works. I don't know exactly on which level this should be fixed, but it is also a problem when it comes to logging of this sql query with a binary field in it, the encoding to bas64 (from code #6416) solves this problem here.

Attachments (1)

19954-1.diff (2.7 KB ) - added by Claude Paroz 11 years ago.

Download all attachments as: .zip

Change History (11)

comment:1 by Claude Paroz, 11 years ago

Triage Stage: UnreviewedAccepted

I'm about to commit the BinaryField patch from #2417. I've tested and reproduced your issue. I found another way to workaround it:

    import binascii
    try:
        return cursor._last_executed.decode('utf-8')
    except UnicodeDecodeError:
        return binascii.b2a_qp(cursor._last_executed)

This gives something like INSERT INTO `model_fields_datamodel` (`short_data`, `data`) VALUES ('=08', =\n'\\0F=FE')

The quoted-printable representation of the binary parameters may not be the best representation, but at least it doesn't crash. Opinions?

comment:2 by marcel.ryser.ch@…, 11 years ago

I can definitely live with that, one can relatively easy convert the quoted-printable back to it's binary form. This would be more than I need in my case.

comment:3 by Claude Paroz, 11 years ago

Has patch: set

The downside of my initial proposal is that other valid utf-8 chars in the query will also be qp encoded, which is not acceptable.

IMHO, the valid alternatives are either:

  1. Split the query and try to decode the query chunk by chunk, falling back to a byte-oriented decoding on errors.
  2. Default to parent last_executed_query when decoding fails

I will attach a patch where I chose to implement 1. with 'unicode-escape' fallback encoding. The issue with 2. is that the log would contain two sorts of queries: those coming from the backend and those being reconstructed with the "QUERY = %r - PARAMS = %r" pattern.

by Claude Paroz, 11 years ago

Attachment: 19954-1.diff added

comment:4 by marcel.ryser.ch@…, 11 years ago

What about binary values who could be represented as UTF-8 characters but aren't characters? Then you have a wrong representation of a binary value, which are not characters, this could lead to confusion.

Django generates the queries (prepared-statements?) by using the field datatype defined in the model, I guess. Why doesn't the logging starts there and not when it is all over? There you could easily decide which mode should be used to represent the value of a field in the log. Binary fields should really be represented as they are: binary and not converted to characters, because we don't know (on django side) what kind of binary data we are dealing with.

in reply to:  4 comment:5 by Claude Paroz, 11 years ago

Patch needs improvement: set

Replying to marcel.ryser.ch@…:

What about binary values who could be represented as UTF-8 characters but aren't characters? Then you have a wrong representation of a binary value, which are not characters, this could lead to confusion.

You are right, but I didn't find any better solution for now.

Django generates the queries (prepared-statements?) by using the field datatype defined in the model, I guess. Why doesn't the logging starts there and not when it is all over? There you could easily decide which mode should be used to represent the value of a field in the log. Binary fields should really be represented as they are: binary and not converted to characters, because we don't know (on django side) what kind of binary data we are dealing with.

Django is separately passing the query with placeholders and parameters to the database backend, it does not generate the full query itself. That's why Django uses the cursor.last_executed_query trick to retrieve the real query. But feel free to experiment and propose a better patch. I'm out of ideas currently :-(

comment:6 by Claude Paroz, 11 years ago

If we don't find any better solution, I think we'll fallback to force_text(query, errors='replace'). It's still better than crashing.

comment:7 by marcel.ryser.ch@…, 11 years ago

Fit's for me, because I don't need to log the binary value, I just expect django to not crash when I have some binary fields in my model.

comment:8 by Claude Paroz <claude@…>, 11 years ago

Resolution: fixed
Status: newclosed

In 86b1c316899f154b1800ab936330ac8924251e40:

Fixed #19954 -- Fixed MySQL _last_executed decoding

Queries can contain binary data undecodable with utf-8. In this
case, using the 'replace' errors mode when decoding seems like
an acceptable representation of the query.
Thanks Marcel Ryser for the report.

comment:9 by webadmin@…, 10 years ago

Since this was introduced in django 1.5 can the fix also be included in the 1.5 branch?

in reply to:  9 comment:10 by loutrea@…, 10 years ago

Replying to webadmin@…:

Since this was introduced in django 1.5 can the fix also be included in the 1.5 branch?

It should.

I met this bug in the 1.5 branch so I fixed it as described here : http://stackoverflow.com/a/18923501

Note: See TracTickets for help on using tickets.
Back to Top