Opened 7 months ago
Last modified 6 days ago
#36571 assigned Cleanup/optimization
Deprecated usage of BINARY expr in MySQL lookups
| Reported by: | Simon Charette | Owned by: | Youssef Tarek Ali |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | mysql binary like |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description
When enabling warnings against MySQL usage of __contains, __startswith, __endswith on MySQL result in the following warning
Warning 1287 "'BINARY expr' is deprecated and will be removed in a future release
It seems like a potential solution is simply to use CAST instead
-
django/db/backends/mysql/base.py
diff --git a/django/db/backends/mysql/base.py b/django/db/backends/mysql/base.py index e83dc106f7..236e6c599e 100644
a b def data_types(self): 164 164 operators = { 165 165 "exact": "= %s", 166 166 "iexact": "LIKE %s", 167 "contains": "LIKE BINARY %s",167 "contains": "LIKE CAST(%s AS BINARY)", 168 168 "icontains": "LIKE %s", 169 169 "gt": "> %s", 170 170 "gte": ">= %s", 171 171 "lt": "< %s", 172 172 "lte": "<= %s", 173 "startswith": "LIKE BINARY %s",174 "endswith": "LIKE BINARY %s",173 "startswith": "LIKE CAST(%s AS BINARY)", 174 "endswith": "LIKE CAST(%s AS BINARY)", 175 175 "istartswith": "LIKE %s", 176 176 "iendswith": "LIKE %s", 177 177 } … … def data_types(self): 186 186 # wildcard for the LIKE operator. 187 187 pattern_esc = r"REPLACE(REPLACE(REPLACE({}, '\\', '\\\\'), '%%', '\%%'), '_', '\_')" 188 188 pattern_ops = { 189 "contains": "LIKE BINARY CONCAT('%%', {}, '%%')",189 "contains": "LIKE CAST(CONCAT('%%', {}, '%%') AS BINARY)", 190 190 "icontains": "LIKE CONCAT('%%', {}, '%%')", 191 "startswith": "LIKE BINARY CONCAT({}, '%%')",191 "startswith": "LIKE CAST(CONCAT({}, '%%') AS BINARY)", 192 192 "istartswith": "LIKE CONCAT({}, '%%')", 193 "endswith": "LIKE BINARY CONCAT('%%', {})",193 "endswith": "LIKE CAST(CONCAT('%%', {}) AS BINARY)", 194 194 "iendswith": "LIKE CONCAT('%%', {})", 195 195 } 196 196 -
django/db/backends/mysql/operations.py
diff --git a/django/db/backends/mysql/operations.py b/django/db/backends/mysql/operations.py index 7dfcd57958..bdde5bbf47 100644
a b def regex_lookup(self, lookup_type): 363 363 # REGEXP_LIKE doesn't exist in MariaDB. 364 364 if self.connection.mysql_is_mariadb: 365 365 if lookup_type == "regex": 366 return "%s REGEXP BINARY %s"366 return "%s REGEXP CAST(%s AS BINARY)" 367 367 return "%s REGEXP %s" 368 368 369 369 match_option = "c" if lookup_type == "regex" else "i"
Change History (17)
comment:2 by , 7 months ago
Would
CAST(... AS BINARY)be enough here? This only casts the pattern and not the column?
I think you're right, we'd need to cast both the left and right hand side of the operator.
That might lead to subtle differences depending on the column's collation or affect index usage. Would
COLLATE ..._binbe closer to the oldLIKE BINARY, since it applies a binary collation to the column and the pattern?
The problem with that is that it requires knowing the right-hand-side's collation at query execution time and append the ..._bin suffix to it.
I guess we could use rhs.output_field.db_collation and default to utf8mb_bin if missing but that could also subtly break as there could be databases out there with collations that are not matching the Django model representation of it and were working fine previously when using LIKE BINARY.
This whole thing makes me wonder why they deprecated this option in the first place as it seems quite handy.
comment:3 by , 7 months ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:4 by , 7 months ago
I've opened a draft PR for this change. At the moment, most of the test suite passes using MySQL but I'm still seeing two failing tests:
expressions.tests.ExpressionsTests.test_patterns_escape
expressions.tests.ExpressionsTests.test_insensitive_patterns_escape
These both deal with % characters in expressions. Literal values ("Article%") behave correctly with escaping, but expressions like F("lastname") containing % are failing.
I've tried a few different approaches to unify the escaping rules so everything passes, but each time i fix one side it tends to blow up elsewhere in the suite. Instead of continuing to hack away at this, I thought i'd take a break and put up a PR in it's current state to get some feedback.
comment:5 by , 7 months ago
| Has patch: | set |
|---|---|
| Patch needs improvement: | set |
comment:6 by , 7 months ago
escaped_rhs = (
"REPLACE("
" REPLACE("
" REPLACE({rhs}, CHAR(92), CONCAT(CHAR(92), CHAR(92))),"
" CHAR(37), CONCAT(CHAR(92), CHAR(37))"
" ),"
" CHAR(95), CONCAT(CHAR(92), CHAR(95))"
")"
).format(rhs=rhs)
I was seing failures in expressions.tests.ExpressionsTests.test_patterns_escape and test_insensitive_patterns_escape because %, _, and \ were not being escaped correctly when the RHS of a pattern lookup was an expression (e.g. F("lastname") ).
Using '\\\\' and '%%%%' produced incorrect escaping once Django interpolated the SQL string. I switched to the above code using CHAR() and all the relevant tests seem to be passing when using MySQL.
comment:7 by , 7 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|---|
| Version: | 5.2 → dev |
Accepting based on the MySQL 8.0.27 release notes:
Important Change: The BINARY operator is now deprecated, and subject to removal in a future release of MySQL. Use of BINARY now causes a warning. Use CAST(... AS BINARY) instead. (WL #13619)
comment:8 by , 7 months ago
i accidently modified the wrong file.. I had the mysql stuff working but the modifications I made were in django/db/models/lookups.py. So those changes blew up all the sqlite tests. Removed those changes.
comment:9 by , 6 months ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:10 by , 6 months ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:11 by , 2 weeks ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:12 by , 8 days ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
I'm looking into this and plan to provide a patch.
comment:13 by , 8 days ago
I've opened a PR to handle this deprecation by switching to CAST(... AS BINARY) in the MySQL backend.
I made sure to only cast the RHS parameters to keep index performance high. I've also run the full expressions and lookup test suites (337 tests) on both MySQL 9.0 and MariaDB latest to make sure there are no regressions, especially with the complex escaping cases that were mentioned in earlier discussions.
follow-up: 15 comment:14 by , 8 days ago
I made sure to only cast the RHS parameters to keep index performance high.
Wasn't the conclusion of the previous discussion that we cannot do just the RHS?
comment:15 by , 8 days ago
Replying to Adam Johnson:
I made sure to only cast the RHS parameters to keep index performance high.
Wasn't the conclusion of the previous discussion that we cannot do just the RHS?
I've verified in a Docker environment (MySQL 9.0 with utf8mb4_general_ci) that casting only the RHS (e.g. LIKE CAST(%s AS BINARY)) correctly forces a case-sensitive comparison. This aligns with the MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_like, which states: "A comparison between a character string and a binary string is treated as a comparison of binary strings."
The primary advantage of this approach is that it preserves index usage by leaving the LHS column untransformed. I have cross-verified this fix by running the full expressions and lookup test suites (337 tests) on both MySQL 9.0 and MariaDB latest backends, and all tests passed (including test_patterns_escape).
follow-up: 17 comment:16 by , 7 days ago
I have to note that while I agree (in comment:2) with comment:1 in theory I never cross verified it myself so it's entirely possible that this simple approach just work.
All I can say is that we've been using an approach similar to the CAST approach at work and it has been working fine for us, we don't use exotic collations though (we stick to utf8_general_ci).
Something is definitely weird with this deprecation as MySQL's LIKE documentation still mentions the usage of BINARY to force case-sensitive comparison. From my understanding we have gone the extra length of doing so on MySQL because the historical default collations are all case-insensitive (e.g. utf8_general_ci) and breaking with this legacy by forcing users to move a case-sensitive collation on a new version of Django would be quite disruptive even with the relatively recent introduction of Field(db_collation).
comment:17 by , 6 days ago
Replying to Simon Charette:
I have to note that while I agree (in comment:2) with comment:1 in theory I never cross verified it myself so it's entirely possible that this simple approach just work.
All I can say is that we've been using an approach similar to the
CASTapproach at work and it has been working fine for us, we don't use exotic collations though (we stick toutf8_general_ci).
Something is definitely weird with this deprecation as MySQL's
LIKEdocumentation still mentions the usage ofBINARYto force case-sensitive comparison. From my understanding we have gone the extra length of doing so on MySQL because the historical default collations are all case-insensitive (e.g.utf8_general_ci) and breaking with this legacy by forcing users to move a case-sensitive collation on a new version of Django would be quite disruptive even with the relatively recent introduction ofField(db_collation).
I verified this with a MySQL 9.0 Docker setup on utf8mb4_general_ci. The RHS-only CAST is case-sensitive because MySQL treats comparisons with a binary string as binary (per the docs). This seems to handle the deprecation nicely without losing index performance on the LHS.
Would
CAST(... AS BINARY)be enough here? This only casts the pattern and not the column? That might lead to subtle differences depending on the column's collation or affect index usage.COLLATE ..._binbe closer to the oldLIKE BINARY, since it applies a binary collation to the column and the pattern?