Opened 8 years ago

Closed 8 years ago

Last modified 7 years ago

#26139 closed Bug (invalid)

UUIDField value must be declared "binary" to avoid MySQL warning

Reported by: Guilhem Bichot Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: MySQL
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I'm a developer of the MySQL DBMS at Oracle.
A user of Django (Giovanni Go) reported this:
http://bugs.mysql.com/bug.php?id=79317
He writes
"We use UUID stored as binary(16) PK to identify users.
We're seeing an issue where MySQL is trying to validate binary as UTF8
starting with MySQL 5.6.27 when trying to insert (and presumingly
select) users. It will insert the row, but since Django treats MySQL
warnings as exception, Django will bail out.
Setup: Python 2.7, Django 1.6.6, MySQL-python 1.2.5, Ubuntu 14.04/OSX
Django will treat any MySQL warnings as exception if you have DEBUG
turned on."

Then he gives a simple Python, Django-less program to show the issue.

I don't have a Django installation, but the problem is apparently that:

  • Django generates a UUID object per the user's request
  • it inserts it into a MySQL table column, with a simple INSERT:

INSERT INTO table (column) VALUES('cryptic chars here');
those can be really cryptic as a UUID string is made of any bytes from
0x00 to 0xFF.

  • MySQL expects the string to be valid utf8; it has expected this for

years, but only recently did it start warning when the string is not
valid utf8 (this is the relevant change:
https://github.com/mysql/mysql-server/commit/33a2e5abd ).

  • In fact, the string is not valid utf8: certain sequences of bytes do

not match utf8 characters. The string is binary. Then it would make
sense for the client to declare it; which is done this way:
INSERT INTO table (column) VALUES(_binary 'cryptic chars here');
The _binary prefix says "what I'm giving you next is binary":
http://dev.mysql.com/doc/refman/5.7/en/charset-literal.html .
Another option would be to use the x syntax in the INSERT:
https://dev.mysql.com/doc/refman/5.7/en/hexadecimal-literals.html .
INSERT INTO table (column) VALUES(x'hex codes of cryptic chars here');

Is there any chance to modify Django's code to address this issue?

In the report, please ignore any comments starting from "[25 Jan 2:14]

Change History (4)

comment:1 by Tim Graham, 8 years ago

Component: UncategorizedDatabase layer (models, ORM)
Keywords: MySQL added
Summary: Django's UUID inserted into MySQL => WarningUUIDField value must be declared "binary" to avoid MySQL warning
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

While the report mentions Django 1.6, I've seen these warnings when running the tests for the Django master branch (1.10.dev at this time).

comment:2 by Claude Paroz, 8 years ago

I thought that on non-PostgreSQL databases, the value of UUIDs were always saved as strings (.hex representation). Is it not the case?

comment:3 by Tim Graham, 8 years ago

Resolution: invalid
Status: newclosed

Yes, Claude is correct. Django's UUIDField uses char(32) -- the binary(16) representation is a third-party implementation. The warnings I'm seeing for Django's test suite are for BinaryField (created #26140).

in reply to:  3 comment:4 by Chris Foresman, 7 years ago

Replying to Tim Graham:

Yes, Claude is correct. Django's UUIDField uses char(32) -- the binary(16) representation is a third-party implementation. The warnings I'm seeing for Django's test suite are for BinaryField (created #26140).

I've been researching this; is there a reason we _don't_ use BINARY(16) for MySQL? It should be massively more efficient in storage and index space. I think if I subclass the existing UUIDField, this should be pretty close to working (I think I need to override __init__ to set the max_length to 16:

class MySQLUUIDField(UUIDField):

    def get_internal_type(self):
        return "BinaryField"
    
    def get_db_prep_value(self, value, connection, prepared=False):
        if value is None:
            return None
        if not isinstance(value, uuid.UUID):
            value = self.to_python(value)

        if connection.features.has_native_uuid_field:
            return value
        return value.bytes
    
    def to_python(self, value):
        if value is not None and not isinstance(value, uuid.UUID):
            try:
                return uuid.UUID(bytes=value)
            except (AttributeError, ValueError):
                raise exceptions.ValidationError(
                    self.error_messages['invalid'],
                    code='invalid',
                    params={'value': value},
                )
        return value

I'm taking some liberties and not accounting for every edge case, but I think that's enough to work with the existing MySQL backend as written, though ideally I'd like the backend to presume BINARY(16) and the field definition could act accordingly. Looking at Oracle and sqlite, those DBs use a BLOB type for binary data, so I'm not sure if there is any benefit or not using that type for a UUID.

Note: See TracTickets for help on using tickets.
Back to Top