Opened 4 weeks ago
Last modified 3 days ago
#36571 assigned Cleanup/optimization
Deprecated usage of BINARY expr in MySQL lookups
Reported by: | Simon Charette | Owned by: | JaeHyuckSa |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | mysql binary like |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
When enabling warnings against MySQL usage of __contains
, __startswith
, __endswith
on MySQL result in the following warning
Warning 1287 "'BINARY expr' is deprecated and will be removed in a future release
It seems like a potential solution is simply to use CAST
instead
-
django/db/backends/mysql/base.py
diff --git a/django/db/backends/mysql/base.py b/django/db/backends/mysql/base.py index e83dc106f7..236e6c599e 100644
a b def data_types(self): 164 164 operators = { 165 165 "exact": "= %s", 166 166 "iexact": "LIKE %s", 167 "contains": "LIKE BINARY %s",167 "contains": "LIKE CAST(%s AS BINARY)", 168 168 "icontains": "LIKE %s", 169 169 "gt": "> %s", 170 170 "gte": ">= %s", 171 171 "lt": "< %s", 172 172 "lte": "<= %s", 173 "startswith": "LIKE BINARY %s",174 "endswith": "LIKE BINARY %s",173 "startswith": "LIKE CAST(%s AS BINARY)", 174 "endswith": "LIKE CAST(%s AS BINARY)", 175 175 "istartswith": "LIKE %s", 176 176 "iendswith": "LIKE %s", 177 177 } … … def data_types(self): 186 186 # wildcard for the LIKE operator. 187 187 pattern_esc = r"REPLACE(REPLACE(REPLACE({}, '\\', '\\\\'), '%%', '\%%'), '_', '\_')" 188 188 pattern_ops = { 189 "contains": "LIKE BINARY CONCAT('%%', {}, '%%')",189 "contains": "LIKE CAST(CONCAT('%%', {}, '%%') AS BINARY)", 190 190 "icontains": "LIKE CONCAT('%%', {}, '%%')", 191 "startswith": "LIKE BINARY CONCAT({}, '%%')",191 "startswith": "LIKE CAST(CONCAT({}, '%%') AS BINARY)", 192 192 "istartswith": "LIKE CONCAT({}, '%%')", 193 "endswith": "LIKE BINARY CONCAT('%%', {})",193 "endswith": "LIKE CAST(CONCAT('%%', {}) AS BINARY)", 194 194 "iendswith": "LIKE CONCAT('%%', {})", 195 195 } 196 196 -
django/db/backends/mysql/operations.py
diff --git a/django/db/backends/mysql/operations.py b/django/db/backends/mysql/operations.py index 7dfcd57958..bdde5bbf47 100644
a b def regex_lookup(self, lookup_type): 363 363 # REGEXP_LIKE doesn't exist in MariaDB. 364 364 if self.connection.mysql_is_mariadb: 365 365 if lookup_type == "regex": 366 return "%s REGEXP BINARY %s"366 return "%s REGEXP CAST(%s AS BINARY)" 367 367 return "%s REGEXP %s" 368 368 369 369 match_option = "c" if lookup_type == "regex" else "i"
Change History (10)
comment:2 by , 4 weeks ago
Would
CAST(... AS BINARY)
be enough here? This only casts the pattern and not the column?
I think you're right, we'd need to cast both the left and right hand side of the operator.
That might lead to subtle differences depending on the column's collation or affect index usage. Would
COLLATE ..._bin
be closer to the oldLIKE BINARY
, since it applies a binary collation to the column and the pattern?
The problem with that is that it requires knowing the right-hand-side's collation at query execution time and append the ..._bin
suffix to it.
I guess we could use rhs.output_field.db_collation
and default to utf8mb_bin
if missing but that could also subtly break as there could be databases out there with collations that are not matching the Django model representation of it and were working fine previously when using LIKE BINARY
.
This whole thing makes me wonder why they deprecated this option in the first place as it seems quite handy.
comment:3 by , 4 weeks ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:4 by , 4 weeks ago
I've opened a draft PR for this change. At the moment, most of the test suite passes using MySQL but I'm still seeing two failing tests:
expressions.tests.ExpressionsTests.test_patterns_escape
expressions.tests.ExpressionsTests.test_insensitive_patterns_escape
These both deal with %
characters in expressions. Literal values ("Article%")
behave correctly with escaping, but expressions like F("lastname")
containing %
are failing.
I've tried a few different approaches to unify the escaping rules so everything passes, but each time i fix one side it tends to blow up elsewhere in the suite. Instead of continuing to hack away at this, I thought i'd take a break and put up a PR in it's current state to get some feedback.
comment:5 by , 4 weeks ago
Has patch: | set |
---|---|
Patch needs improvement: | set |
comment:6 by , 4 weeks ago
escaped_rhs = ( "REPLACE(" " REPLACE(" " REPLACE({rhs}, CHAR(92), CONCAT(CHAR(92), CHAR(92)))," " CHAR(37), CONCAT(CHAR(92), CHAR(37))" " )," " CHAR(95), CONCAT(CHAR(92), CHAR(95))" ")" ).format(rhs=rhs)
I was seing failures in expressions.tests.ExpressionsTests.test_patterns_escape
and test_insensitive_patterns_escape
because %
, _
, and \
were not being escaped correctly when the RHS of a pattern lookup was an expression (e.g. F("lastname")
).
Using '\\\\'
and '%%%%'
produced incorrect escaping once Django interpolated the SQL string. I switched to the above code using CHAR()
and all the relevant tests seem to be passing when using MySQL.
comment:7 by , 4 weeks ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Version: | 5.2 → dev |
Accepting based on the MySQL 8.0.27 release notes:
Important Change: The BINARY operator is now deprecated, and subject to removal in a future release of MySQL. Use of BINARY now causes a warning. Use CAST(... AS BINARY) instead. (WL #13619)
comment:8 by , 4 weeks ago
i accidently modified the wrong file.. I had the mysql stuff working but the modifications I made were in django/db/models/lookups.py. So those changes blew up all the sqlite tests. Removed those changes.
comment:9 by , 4 days ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:10 by , 3 days ago
Owner: | set to |
---|---|
Status: | new → assigned |
Would
CAST(... AS BINARY)
be enough here? This only casts the pattern and not the column? That might lead to subtle differences depending on the column's collation or affect index usage. WouldCOLLATE ..._bin
be closer to the oldLIKE BINARY
, since it applies a binary collation to the column and the pattern?