Opened 4 weeks ago

Last modified 3 days ago

#36571 assigned Cleanup/optimization

Deprecated usage of BINARY expr in MySQL lookups

Reported by: Simon Charette Owned by: JaeHyuckSa
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: mysql binary like
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

When enabling warnings against MySQL usage of __contains, __startswith, __endswith on MySQL result in the following warning

Warning 1287 "'BINARY expr' is deprecated and will be removed in a future release

It seems like a potential solution is simply to use CAST instead

  • django/db/backends/mysql/base.py

    diff --git a/django/db/backends/mysql/base.py b/django/db/backends/mysql/base.py
    index e83dc106f7..236e6c599e 100644
    a b def data_types(self):  
    164164    operators = {
    165165        "exact": "= %s",
    166166        "iexact": "LIKE %s",
    167         "contains": "LIKE BINARY %s",
     167        "contains": "LIKE CAST(%s AS BINARY)",
    168168        "icontains": "LIKE %s",
    169169        "gt": "> %s",
    170170        "gte": ">= %s",
    171171        "lt": "< %s",
    172172        "lte": "<= %s",
    173         "startswith": "LIKE BINARY %s",
    174         "endswith": "LIKE BINARY %s",
     173        "startswith": "LIKE CAST(%s AS BINARY)",
     174        "endswith": "LIKE CAST(%s AS BINARY)",
    175175        "istartswith": "LIKE %s",
    176176        "iendswith": "LIKE %s",
    177177    }
    def data_types(self):  
    186186    # wildcard for the LIKE operator.
    187187    pattern_esc = r"REPLACE(REPLACE(REPLACE({}, '\\', '\\\\'), '%%', '\%%'), '_', '\_')"
    188188    pattern_ops = {
    189         "contains": "LIKE BINARY CONCAT('%%', {}, '%%')",
     189        "contains": "LIKE CAST(CONCAT('%%', {}, '%%') AS BINARY)",
    190190        "icontains": "LIKE CONCAT('%%', {}, '%%')",
    191         "startswith": "LIKE BINARY CONCAT({}, '%%')",
     191        "startswith": "LIKE CAST(CONCAT({}, '%%') AS BINARY)",
    192192        "istartswith": "LIKE CONCAT({}, '%%')",
    193         "endswith": "LIKE BINARY CONCAT('%%', {})",
     193        "endswith": "LIKE CAST(CONCAT('%%', {}) AS BINARY)",
    194194        "iendswith": "LIKE CONCAT('%%', {})",
    195195    }
    196196
  • django/db/backends/mysql/operations.py

    diff --git a/django/db/backends/mysql/operations.py b/django/db/backends/mysql/operations.py
    index 7dfcd57958..bdde5bbf47 100644
    a b def regex_lookup(self, lookup_type):  
    363363        # REGEXP_LIKE doesn't exist in MariaDB.
    364364        if self.connection.mysql_is_mariadb:
    365365            if lookup_type == "regex":
    366                 return "%s REGEXP BINARY %s"
     366                return "%s REGEXP CAST(%s AS BINARY)"
    367367            return "%s REGEXP %s"
    368368
    369369        match_option = "c" if lookup_type == "regex" else "i"

Change History (10)

comment:1 by Jason Hall, 4 weeks ago

Would CAST(... AS BINARY) be enough here? This only casts the pattern and not the column? That might lead to subtle differences depending on the column's collation or affect index usage. Would COLLATE ..._bin be closer to the old LIKE BINARY, since it applies a binary collation to the column and the pattern?

Last edited 4 weeks ago by Jason Hall (previous) (diff)

comment:2 by Simon Charette, 4 weeks ago

Would CAST(... AS BINARY) be enough here? This only casts the pattern and not the column?

I think you're right, we'd need to cast both the left and right hand side of the operator.

That might lead to subtle differences depending on the column's collation or affect index usage. Would COLLATE ..._bin be closer to the old LIKE BINARY, since it applies a binary collation to the column and the pattern?

The problem with that is that it requires knowing the right-hand-side's collation at query execution time and append the ..._bin suffix to it.

I guess we could use rhs.output_field.db_collation and default to utf8mb_bin if missing but that could also subtly break as there could be databases out there with collations that are not matching the Django model representation of it and were working fine previously when using LIKE BINARY.

This whole thing makes me wonder why they deprecated this option in the first place as it seems quite handy.

comment:3 by Jason Hall, 4 weeks ago

Owner: set to Jason Hall
Status: newassigned

comment:4 by Jason Hall, 4 weeks ago

I've opened a draft PR for this change. At the moment, most of the test suite passes using MySQL but I'm still seeing two failing tests:
expressions.tests.ExpressionsTests.test_patterns_escape
expressions.tests.ExpressionsTests.test_insensitive_patterns_escape

These both deal with % characters in expressions. Literal values ("Article%") behave correctly with escaping, but expressions like F("lastname") containing % are failing.

I've tried a few different approaches to unify the escaping rules so everything passes, but each time i fix one side it tends to blow up elsewhere in the suite. Instead of continuing to hack away at this, I thought i'd take a break and put up a PR in it's current state to get some feedback.

comment:5 by Jason Hall, 4 weeks ago

Has patch: set
Patch needs improvement: set

comment:6 by Jason Hall, 4 weeks ago

escaped_rhs = (
                "REPLACE("
                "  REPLACE("
                "    REPLACE({rhs}, CHAR(92), CONCAT(CHAR(92), CHAR(92))),"
                "    CHAR(37), CONCAT(CHAR(92), CHAR(37))"
                "  ),"
                "  CHAR(95), CONCAT(CHAR(92), CHAR(95))"
                ")"
            ).format(rhs=rhs)

I was seing failures in expressions.tests.ExpressionsTests.test_patterns_escape and test_insensitive_patterns_escape because %, _, and \ were not being escaped correctly when the RHS of a pattern lookup was an expression (e.g. F("lastname") ).

Using '\\\\' and '%%%%' produced incorrect escaping once Django interpolated the SQL string. I switched to the above code using CHAR() and all the relevant tests seem to be passing when using MySQL.

comment:7 by Natalia Bidart, 4 weeks ago

Triage Stage: UnreviewedAccepted
Version: 5.2dev

Accepting based on the MySQL 8.0.27 release notes:

Important Change: The BINARY operator is now deprecated, and subject to removal in a future release of MySQL. Use of BINARY now causes a warning. Use CAST(... AS BINARY) instead. (WL #13619)

comment:8 by Jason Hall, 4 weeks ago

i accidently modified the wrong file.. I had the mysql stuff working but the modifications I made were in django/db/models/lookups.py. So those changes blew up all the sqlite tests. Removed those changes.

comment:9 by Jason Hall, 4 days ago

Owner: Jason Hall removed
Status: assignednew

comment:10 by JaeHyuckSa, 3 days ago

Owner: set to JaeHyuckSa
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top