Opened 13 months ago

Closed 6 months ago

Last modified 4 months ago

#32691 closed Cleanup/optimization (fixed)

Performance regression in Exact lookup on BooleanField on MySQL.

Reported by: Todor Velichkov Owned by: Roman Miroshnychenko
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: bool, booleanfield, orm, sql, mysql
Cc: Adam Johnson, Simon Charette, Dan Strokirk Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When filtering on boolean field in Django 2.2 the SQL output would look like this:

    WHERE `mytable`.`myfield` = 1

in Django 3.2 the SQL changed to:

    WHERE `mytable`.`myfield`

The problem with the 2nd approach is that in mysql8 this field is not used in indexes anymore, causing a great performance degradation.

Change History (19)

comment:1 Changed 13 months ago by Mariusz Felisiak

Resolution: invalid
Status: newclosed
Type: UncategorizedCleanup/optimization

Thanks for this ticket, however both clauses are correct in MySQL. The way how the ORM builds filters is an implementation detail not something Django should guarantee (see #25367).

comment:2 Changed 13 months ago by Mariusz Felisiak

Summary: Django 3.2 boolean field SQL output is causing a performance regression in mysql.Django 3.2 boolean field SQL output is causing a performance regression in MySQL.

comment:3 Changed 13 months ago by Todor Velichkov

Hi Mariusz,
I don't quite agree, both clauses are syntactically correct yes, but they are not the same.

MySQL does not have a boolean type of column, so Django is using tinyint
which stores value between -128..127 or 0..255

When you tell mysql you are looking for `mytable`.`myfield` this basically translates to all values between -128..127 except 0 which is a range query equivalent to myfield__in=[-128..-1]+[1..127]

even myfield__exact lookup is acting the same way.

comment:4 Changed 13 months ago by Mariusz Felisiak

MySQL does not have a boolean type of column, so Django is using ​tinyint

Django is using BOOL which is an alias for tinyint(1). As far as I'm aware, storing values except 0 and 1 in BOOL columns has never been supported by Django.

comment:5 Changed 13 months ago by Todor Velichkov

Django is not storing values except 0 and 1 in BOOL, but MySQL tinyint(1) support it.

So my question is, why is Django asking MySQL to search for values other than 0 and 1 in a BOOL column. Doesn't it make sense to only search for 0 and 1 when Django is only storing 0 and 1?

My second question is: Let's say Django is right about asking MySQL to search for values other than 0 and 1 in a BOOL column, then shouldn't it at least myboofield__exact act in a different way and ask MySQL to look for a specific value ? Django is literally leaving us w/o an option here and using a boolean field as index/ inside composite index is impossible.

comment:6 Changed 13 months ago by Mariusz Felisiak

Cc: Adam Johnson added

Django is literally leaving us w/o an option here and using a boolean field as index/ inside composite index is impossible.

I'm not a MySQL expert, can you provide any link to the MySQL docs which will confirm that it's not possible. It's surprising that you cannot create an index with a BOOL (TINYINT) column.

comment:7 Changed 13 months ago by Todor Velichkov

The issue is not that the index will not be created, the index will be created, but it will not be used against expressions like:

WHERE `mytable`.`myfield`

because such an expression evaluates to True for a range of values (in the case of signed tinyint(1) the range is -128..127 except 0) which turns the query into a range query which are hard to optimize. MySQL needs a constant value like:

WHERE `mytable`.`myfield` = 1

in order for the index to be used.

An example: If we have a row with a value of myfield = 2 (MySQL allows this because the column is tinyint(1)) then:

  • The first expression will include this row in the results because 2 evaluates to True
  • The second expression will not include this row because 2 != 1

Django only stores 0 and 1 why its expression is including rows which Django itself do not store.

comment:8 Changed 13 months ago by Simon Charette

If we want to move forward with this change the only possible resolution I see is to add a has_native_boolean_type database feature and set it to false for MySQL. Another more intrusive alternative would be to have BooleanField.get_db_prep_value return an int instead of a bool on MySQL so the lookup machinery is made aware of this quirk instead of having it magically happen at the database connector layer (mysqlclient, pymysql).

In the mean time this can be worked around by doing .filter(boolean_field=1). Entirely reverting this change would possibly cause a breakage in all function index on boolean expressions created on other backends.

In all cases I'd argue that creating an index on a boolean field is a bad practice particularly when it's backed by a b-tree implementation and your backend doesn't support partial indices to allow excluding parts of the covered table as your underlying tree will get heavily unbalanced on way or the other.

comment:9 Changed 13 months ago by Todor Velichkov

In the mean time this can be worked around by doing .filter(boolean_field=1). Entirely reverting this change would possibly cause a breakage in all function index on boolean expressions created on other backends.

I tried this before opening the ticket, but it didn't work, however it looks like .filter(boolean_field=models.Value(1)) is working!!
For now this should be sufficient for us to complete our migration to Django 3.2! Thanks for the time spent on this!

comment:10 Changed 8 months ago by Roman Miroshnychenko

Why not add as_mysql() method to Exact lookup that simply calls parent's as_sql() method instead of doing this boolean field magic as in Exact.as_sql()? We had the same issue as in the original post and this solution worked for us.

comment:11 Changed 8 months ago by Simon Charette

Cc: Simon Charette added

Adding a dedicated Exact.as_mysql method would also work, feels like a feature flag might be more appropriate though.

comment:12 Changed 7 months ago by Dan Strokirk

Cc: Dan Strokirk added

We've also seen this change cause a production outage when we've tried upgrading to Django 3, so either a notice in the docs or some other way to to avoid it changing back again in the future would be great.

Thanks Todor for your tip about models.Value(1)!

comment:13 Changed 7 months ago by Roman Miroshnychenko

Has patch: set

comment:14 Changed 7 months ago by Roman Miroshnychenko

I have submitted a PR, hopefully it will be accepted.

comment:15 Changed 7 months ago by Mariusz Felisiak

Resolution: invalid
Status: closednew
Triage Stage: UnreviewedAccepted

comment:16 Changed 7 months ago by Mariusz Felisiak

Owner: changed from nobody to Roman Miroshnychenko
Status: newassigned
Summary: Django 3.2 boolean field SQL output is causing a performance regression in MySQL.Performance regression in Exact lookup on BooleanField on MySQL.
Triage Stage: AcceptedReady for checkin

comment:17 Changed 6 months ago by Mariusz Felisiak <felisiak.mariusz@…>

Resolution: fixed
Status: assignedclosed

In 407fe95c:

Fixed #32691 -- Made Exact lookup on BooleanFields compare directly to a boolean value on MySQL.

Performance regression in 37e6c5b79bd0529a3c85b8c478e4002fd33a2a1d.

Thanks Todor Velichkov for the report.

Co-authored-by: Mariusz Felisiak <felisiak.mariusz@…>

comment:18 Changed 4 months ago by Anže Pečar

Any chance that we could cheery-pick the fix into Django 3.2? The issue nearly caused a production outage for us when we upgraded to Django 3.2 and it would be great to make sure nobody else gets burned by this.

comment:19 in reply to:  18 Changed 4 months ago by Mariusz Felisiak

Replying to Anže Pečar:

Any chance that we could cheery-pick the fix into Django 3.2? The issue nearly caused a production outage for us when we upgraded to Django 3.2 and it would be great to make sure nobody else gets burned by this.

Unfortunately not, this is a regression in Django 3.1 that is no longer supported. Per our backporting policy this means it doesn't qualify for a backport to 3.2.x or 4.0.x anymore.
See Django’s release process for more details.

Note: See TracTickets for help on using tickets.
Back to Top