#32691 closed Cleanup/optimization (fixed)
Performance regression in Exact lookup on BooleanField on MySQL.
Reported by: | Todor Velichkov | Owned by: | Roman Miroshnychenko |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | bool, booleanfield, orm, sql, mysql |
Cc: | Adam Johnson, Simon Charette, Dan Strokirk | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When filtering on boolean field in Django 2.2 the SQL output would look like this:
WHERE `mytable`.`myfield` = 1
in Django 3.2 the SQL changed to:
WHERE `mytable`.`myfield`
The problem with the 2nd approach is that in mysql8 this field is not used in indexes anymore, causing a great performance degradation.
Change History (19)
comment:1 Changed 2 years ago by
Resolution: | → invalid |
---|---|
Status: | new → closed |
Type: | Uncategorized → Cleanup/optimization |
comment:2 Changed 2 years ago by
Summary: | Django 3.2 boolean field SQL output is causing a performance regression in mysql. → Django 3.2 boolean field SQL output is causing a performance regression in MySQL. |
---|
comment:3 Changed 2 years ago by
Hi Mariusz,
I don't quite agree, both clauses are syntactically correct yes, but they are not the same.
MySQL does not have a boolean type of column, so Django is using tinyint
which stores value between -128..127
or 0..255
When you tell mysql
you are looking for `mytable`.`myfield`
this basically translates to all values between -128..127
except 0
which is a range query equivalent to myfield__in=[-128..-1]+[1..127]
even myfield__exact
lookup is acting the same way.
comment:4 Changed 2 years ago by
MySQL does not have a boolean type of column, so Django is using tinyint
Django is using BOOL
which is an alias for tinyint(1)
. As far as I'm aware, storing values except 0
and 1
in BOOL
columns has never been supported by Django.
comment:5 Changed 2 years ago by
Django is not storing values except 0
and 1
in BOOL
, but MySQL tinyint(1)
support it.
So my question is, why is Django asking MySQL to search for values other than 0
and 1
in a BOOL
column. Doesn't it make sense to only search for 0
and 1
when Django is only storing 0
and 1
?
My second question is: Let's say Django is right about asking MySQL to search for values other than 0
and 1
in a BOOL
column, then shouldn't it at least myboofield__exact
act in a different way and ask MySQL to look for a specific value ? Django is literally leaving us w/o an option here and using a boolean field as index/ inside composite index is impossible.
comment:6 Changed 2 years ago by
Cc: | Adam Johnson added |
---|
Django is literally leaving us w/o an option here and using a boolean field as index/ inside composite index is impossible.
I'm not a MySQL expert, can you provide any link to the MySQL docs which will confirm that it's not possible. It's surprising that you cannot create an index with a BOOL
(TINYINT
) column.
comment:7 Changed 2 years ago by
The issue is not that the index will not be created, the index will be created, but it will not be used against expressions like:
WHERE `mytable`.`myfield`
because such an expression evaluates to True
for a range of values (in the case of signed tinyint(1)
the range is -128..127
except 0
) which turns the query into a range
query which are hard to optimize. MySQL needs a constant value like:
WHERE `mytable`.`myfield` = 1
in order for the index to be used.
An example: If we have a row with a value of myfield = 2
(MySQL allows this because the column is tinyint(1)
) then:
- The first expression will include this row in the results because
2
evaluates toTrue
- The second expression will not include this row because
2 != 1
Django only stores 0
and 1
why its expression is including rows which Django itself do not store.
comment:8 Changed 2 years ago by
If we want to move forward with this change the only possible resolution I see is to add a has_native_boolean_type
database feature and set it to false for MySQL. Another more intrusive alternative would be to have BooleanField.get_db_prep_value
return an int
instead of a bool
on MySQL so the lookup machinery is made aware of this quirk instead of having it magically happen at the database connector layer (mysqlclient
, pymysql
).
In the mean time this can be worked around by doing .filter(boolean_field=1)
. Entirely reverting this change would possibly cause a breakage in all function index on boolean expressions created on other backends.
In all cases I'd argue that creating an index on a boolean field is a bad practice particularly when it's backed by a b-tree implementation and your backend doesn't support partial indices to allow excluding parts of the covered table as your underlying tree will get heavily unbalanced on way or the other.
comment:9 Changed 2 years ago by
In the mean time this can be worked around by doing
.filter(boolean_field=1)
. Entirely reverting this change would possibly cause a breakage in all function index on boolean expressions created on other backends.
I tried this before opening the ticket, but it didn't work, however it looks like .filter(boolean_field=models.Value(1))
is working!!
For now this should be sufficient for us to complete our migration to Django 3.2
! Thanks for the time spent on this!
comment:10 Changed 21 months ago by
Why not add as_mysql()
method to Exact
lookup that simply calls parent's as_sql()
method instead of doing this boolean field magic as in Exact.as_sql()
? We had the same issue as in the original post and this solution worked for us.
comment:11 Changed 20 months ago by
Cc: | Simon Charette added |
---|
Adding a dedicated Exact.as_mysql
method would also work, feels like a feature flag might be more appropriate though.
comment:12 Changed 20 months ago by
Cc: | Dan Strokirk added |
---|
We've also seen this change cause a production outage when we've tried upgrading to Django 3, so either a notice in the docs or some other way to to avoid it changing back again in the future would be great.
Thanks Todor for your tip about models.Value(1)
!
comment:13 Changed 20 months ago by
Has patch: | set |
---|
comment:15 Changed 20 months ago by
Resolution: | invalid |
---|---|
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
comment:16 Changed 20 months ago by
Owner: | changed from nobody to Roman Miroshnychenko |
---|---|
Status: | new → assigned |
Summary: | Django 3.2 boolean field SQL output is causing a performance regression in MySQL. → Performance regression in Exact lookup on BooleanField on MySQL. |
Triage Stage: | Accepted → Ready for checkin |
comment:18 follow-up: 19 Changed 17 months ago by
Any chance that we could cheery-pick the fix into Django 3.2? The issue nearly caused a production outage for us when we upgraded to Django 3.2 and it would be great to make sure nobody else gets burned by this.
comment:19 Changed 17 months ago by
Replying to Anže Pečar:
Any chance that we could cheery-pick the fix into Django 3.2? The issue nearly caused a production outage for us when we upgraded to Django 3.2 and it would be great to make sure nobody else gets burned by this.
Unfortunately not, this is a regression in Django 3.1 that is no longer supported. Per our backporting policy this means it doesn't qualify for a backport to 3.2.x or 4.0.x anymore.
See Django’s release process for more details.
Thanks for this ticket, however both clauses are correct in MySQL. The way how the ORM builds filters is an implementation detail not something Django should guarantee (see #25367).