#19954 closed Bug (fixed)
Storing of Binary fields leads to Exceptions
| Reported by: | Owned by: | nobody | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.5 | 
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | yes | 
| Easy pickings: | no | UI/UX: | no | 
Description
I am using a BinaryField on a table to store a binary value
Class for this BinaryField:
class BinaryField(models.TextField):
    def get_prep_value(self, value):
        return value;
This worked with 1.4 release pretty good. With 1.5 i got this exception
  File "C:\Program Files\Python27\lib\site-packages\django\core\handlers\base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "C:\dev\workspace\myproject\Project\src\package\adminviews.py", line 16, in changevalue
    movie.save()
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\base.py", line 546, in save
    force_update=force_update, update_fields=update_fields)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\base.py", line 626, in save_base
    rows = manager.using(using).filter(pk=pk_val)._update(values)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\query.py", line 603, in _update
    return query.get_compiler(self.db).execute_sql(None)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 1014, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "C:\Program Files\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 840, in execute_sql
    cursor.execute(sql, params)
  File "C:\Program Files\Python27\lib\site-packages\django\db\backends\util.py", line 45, in execute
    sql = self.db.ops.last_executed_query(self.cursor, sql, params)
  File "C:\Program Files\Python27\lib\site-packages\django\db\backends\mysql\base.py", line 243, in last_executed_query
    return cursor._last_executed.decode('utf-8')
  File "C:\Program Files\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 708: invalid start byte
This is right, one cannot decode a binary value to utf8 characters. So I saw Ticket #6416 which describes pretty good my problem. I patched the mysql adapter with the code from #6416 and now it works. I don't know exactly on which level this should be fixed, but it is also a problem when it comes to logging of this sql query with a binary field in it, the encoding to bas64 (from code #6416) solves this problem here.
Attachments (1)
Change History (11)
comment:1 by , 13 years ago
| Triage Stage: | Unreviewed → Accepted | 
|---|
comment:2 by , 13 years ago
I can definitely live with that, one can relatively easy convert the quoted-printable back to it's binary form. This would be more than I need in my case.
comment:3 by , 13 years ago
| Has patch: | set | 
|---|
The downside of my initial proposal is that other valid utf-8 chars in the query will also be qp encoded, which is not acceptable.
IMHO, the valid alternatives are either:
- Split the query and try to decode the query chunk by chunk, falling back to a byte-oriented decoding on errors.
- Default to parent last_executed_querywhen decoding fails
I will attach a patch where I chose to implement 1. with 'unicode-escape' fallback encoding. The issue with 2. is that the log would contain two sorts of queries: those coming from the backend and those being reconstructed with the "QUERY = %r - PARAMS = %r" pattern.
by , 13 years ago
| Attachment: | 19954-1.diff added | 
|---|
follow-up: 5 comment:4 by , 13 years ago
What about binary values who could be represented as UTF-8 characters but aren't characters? Then you have a wrong representation of a binary value, which are not characters, this could lead to confusion.
Django generates the queries (prepared-statements?) by using the field datatype defined in the model, I guess. Why doesn't the logging starts there and not when it is all over? There you could easily decide which mode should be used to represent the value of a field in the log. Binary fields should really be represented as they are: binary and not converted to characters, because we don't know (on django side) what kind of binary data we are dealing with.
comment:5 by , 13 years ago
| Patch needs improvement: | set | 
|---|
Replying to marcel.ryser.ch@…:
What about binary values who could be represented as UTF-8 characters but aren't characters? Then you have a wrong representation of a binary value, which are not characters, this could lead to confusion.
You are right, but I didn't find any better solution for now. 
Django generates the queries (prepared-statements?) by using the field datatype defined in the model, I guess. Why doesn't the logging starts there and not when it is all over? There you could easily decide which mode should be used to represent the value of a field in the log. Binary fields should really be represented as they are: binary and not converted to characters, because we don't know (on django side) what kind of binary data we are dealing with.
Django is separately passing the query with placeholders and parameters to the database backend, it does not generate the full query itself. That's why Django uses the cursor.last_executed_query trick to retrieve the real query. But feel free to experiment and propose a better patch. I'm out of ideas currently :-(
comment:6 by , 13 years ago
If we don't find any better solution, I think we'll fallback to force_text(query, errors='replace'). It's still better than crashing.
comment:7 by , 13 years ago
Fit's for me, because I don't need to log the binary value, I just expect django to not crash when I have some binary fields in my model.
comment:8 by , 13 years ago
| Resolution: | → fixed | 
|---|---|
| Status: | new → closed | 
follow-up: 10 comment:9 by , 12 years ago
Since this was introduced in django 1.5 can the fix also be included in the 1.5 branch?
comment:10 by , 12 years ago
Replying to webadmin@…:
Since this was introduced in django 1.5 can the fix also be included in the 1.5 branch?
It should.
I met this bug in the 1.5 branch so I fixed it as described here : http://stackoverflow.com/a/18923501
I'm about to commit the
BinaryFieldpatch from #2417. I've tested and reproduced your issue. I found another way to workaround it:import binascii try: return cursor._last_executed.decode('utf-8') except UnicodeDecodeError: return binascii.b2a_qp(cursor._last_executed)This gives something like
INSERT INTO `model_fields_datamodel` (`short_data`, `data`) VALUES ('=08', =\n'\\0F=FE')The quoted-printable representation of the binary parameters may not be the best representation, but at least it doesn't crash. Opinions?