Code

Opened 17 months ago

Closed 16 months ago

Last modified 7 months ago

#19954 closed Bug (fixed)

Storing of Binary fields leads to Exceptions

Reported by: marcel.ryser.ch@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I am using a BinaryField on a table to store a binary value

Class for this BinaryField:

class BinaryField(models.TextField):
    def get_prep_value(self, value):
        return value;

This worked with 1.4 release pretty good. With 1.5 i got this exception

  File "C:\Program Files\Python27\lib\site-packages\django\core\handlers\base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "C:\dev\workspace\myproject\Project\src\package\adminviews.py", line 16, in changevalue
    movie.save()
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\base.py", line 546, in save
    force_update=force_update, update_fields=update_fields)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\base.py", line 626, in save_base
    rows = manager.using(using).filter(pk=pk_val)._update(values)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\query.py", line 603, in _update
    return query.get_compiler(self.db).execute_sql(None)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 1014, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 840, in execute_sql
    cursor.execute(sql, params)
  File "C:\Program Files\Python27\lib\site-packages\django\db\backends\util.py", line 45, in execute
    sql = self.db.ops.last_executed_query(self.cursor, sql, params)
  File "C:\Program Files\Python27\lib\site-packages\django\db\backends\mysql\base.py", line 243, in last_executed_query
    return cursor._last_executed.decode('utf-8')
  File "C:\Program Files\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 708: invalid start byte

This is right, one cannot decode a binary value to utf8 characters. So I saw Ticket #6416 which describes pretty good my problem. I patched the mysql adapter with the code from #6416 and now it works. I don't know exactly on which level this should be fixed, but it is also a problem when it comes to logging of this sql query with a binary field in it, the encoding to bas64 (from code #6416) solves this problem here.

Attachments (1)

19954-1.diff (2.7 KB) - added by claudep 17 months ago.

Download all attachments as: .zip

Change History (11)

comment:1 Changed 17 months ago by claudep

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

I'm about to commit the BinaryField patch from #2417. I've tested and reproduced your issue. I found another way to workaround it:

    import binascii
    try:
        return cursor._last_executed.decode('utf-8')
    except UnicodeDecodeError:
        return binascii.b2a_qp(cursor._last_executed)

This gives something like INSERT INTO `model_fields_datamodel` (`short_data`, `data`) VALUES ('=08', =\n'\\0F=FE')

The quoted-printable representation of the binary parameters may not be the best representation, but at least it doesn't crash. Opinions?

comment:2 Changed 17 months ago by marcel.ryser.ch@…

I can definitely live with that, one can relatively easy convert the quoted-printable back to it's binary form. This would be more than I need in my case.

comment:3 Changed 17 months ago by claudep

  • Has patch set

The downside of my initial proposal is that other valid utf-8 chars in the query will also be qp encoded, which is not acceptable.

IMHO, the valid alternatives are either:

  1. Split the query and try to decode the query chunk by chunk, falling back to a byte-oriented decoding on errors.
  2. Default to parent last_executed_query when decoding fails

I will attach a patch where I chose to implement 1. with 'unicode-escape' fallback encoding. The issue with 2. is that the log would contain two sorts of queries: those coming from the backend and those being reconstructed with the "QUERY = %r - PARAMS = %r" pattern.

Changed 17 months ago by claudep

comment:4 follow-up: Changed 17 months ago by marcel.ryser.ch@…

What about binary values who could be represented as UTF-8 characters but aren't characters? Then you have a wrong representation of a binary value, which are not characters, this could lead to confusion.

Django generates the queries (prepared-statements?) by using the field datatype defined in the model, I guess. Why doesn't the logging starts there and not when it is all over? There you could easily decide which mode should be used to represent the value of a field in the log. Binary fields should really be represented as they are: binary and not converted to characters, because we don't know (on django side) what kind of binary data we are dealing with.

comment:5 in reply to: ↑ 4 Changed 16 months ago by claudep

  • Patch needs improvement set

Replying to marcel.ryser.ch@…:

What about binary values who could be represented as UTF-8 characters but aren't characters? Then you have a wrong representation of a binary value, which are not characters, this could lead to confusion.

You are right, but I didn't find any better solution for now.

Django generates the queries (prepared-statements?) by using the field datatype defined in the model, I guess. Why doesn't the logging starts there and not when it is all over? There you could easily decide which mode should be used to represent the value of a field in the log. Binary fields should really be represented as they are: binary and not converted to characters, because we don't know (on django side) what kind of binary data we are dealing with.

Django is separately passing the query with placeholders and parameters to the database backend, it does not generate the full query itself. That's why Django uses the cursor.last_executed_query trick to retrieve the real query. But feel free to experiment and propose a better patch. I'm out of ideas currently :-(

comment:6 Changed 16 months ago by claudep

If we don't find any better solution, I think we'll fallback to force_text(query, errors='replace'). It's still better than crashing.

comment:7 Changed 16 months ago by marcel.ryser.ch@…

Fit's for me, because I don't need to log the binary value, I just expect django to not crash when I have some binary fields in my model.

comment:8 Changed 16 months ago by Claude Paroz <claude@…>

  • Resolution set to fixed
  • Status changed from new to closed

In 86b1c316899f154b1800ab936330ac8924251e40:

Fixed #19954 -- Fixed MySQL _last_executed decoding

Queries can contain binary data undecodable with utf-8. In this
case, using the 'replace' errors mode when decoding seems like
an acceptable representation of the query.
Thanks Marcel Ryser for the report.

comment:9 follow-up: Changed 10 months ago by webadmin@…

Since this was introduced in django 1.5 can the fix also be included in the 1.5 branch?

comment:10 in reply to: ↑ 9 Changed 7 months ago by loutrea@…

Replying to webadmin@…:

Since this was introduced in django 1.5 can the fix also be included in the 1.5 branch?

It should.

I met this bug in the 1.5 branch so I fixed it as described here : http://stackoverflow.com/a/18923501

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.