#31453 closed Cleanup/optimization (fixed)
Avoid comparison against a boolean value for Exists().
| Reported by: | Balazs Endresz | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 3.0 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Django creates sql for Exists subqueries like this: WHERE EXISTS(...) = true
But for some reason this results in a very inefficient query plan (at least in postgres 11.2).
Simply removing = true makes it several hundred times faster in some cases.
With = true:
EXPLAIN ANALYZE SELECT COUNT(*)
FROM (
SELECT "ap_facilitysubtype"."facility_sub_type_id" AS Col1
FROM "ap_facilitysubtype"
WHERE EXISTS(
SELECT U0."facility_id"
FROM "ap_facility" U0
WHERE U0."facility_sub_type_id" = ("ap_facilitysubtype"."facility_sub_type_id")
) = true
) subquery;
QUERY PLAN
---------------------
Aggregate (cost=225.56..225.57 rows=1 width=8) (actual time=1100.085..1100.086 rows=1 loops=1)
-> Seq Scan on ap_facilitysubtype (cost=0.00..225.41 rows=58 width=0) (actual time=0.137..1100.040 rows=61 loops=1)
Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
Rows Removed by Filter: 54
SubPlan 1
-> Seq Scan on ap_facility u0 (cost=0.00..3248.91 rows=1704 width=0) (actual time=9.560..9.560 rows=1 loops=115)
Filter: ("facility_sub_type_id" = ap_facilitysubtype."facility_sub_type_id")
Rows Removed by Filter: 53523
SubPlan 2
-> Seq Scan on ap_facility u0_1 (cost=0.00..2993.33 rows=102233 width=4) (never executed)
Planning Time: 1.184 ms
Execution Time: 1100.749 ms
Without = true:
EXPLAIN ANALYZE SELECT COUNT(*)
FROM (
SELECT "ap_facilitysubtype"."facility_sub_type_id" AS Col1
FROM "ap_facilitysubtype"
WHERE EXISTS(
SELECT U0."facility_id"
FROM "ap_facility" U0
WHERE U0."facility_sub_type_id" = ("ap_facilitysubtype"."facility_sub_type_id")
)
) subquery;
QUERY PLAN
---------------------
Aggregate (cost=52.24..52.25 rows=1 width=8) (actual time=1.759..1.760 rows=1 loops=1)
-> Nested Loop Semi Join (cost=0.29..52.09 rows=60 width=0) (actual time=0.067..1.734 rows=61 loops=1)
-> Seq Scan on ap_facilitysubtype (cost=0.00..6.15 rows=115 width=4) (actual time=0.015..0.052 rows=115 loops=1)
-> Index Only Scan using "ap_facility_facility_sub_type_id_577e61d3" on ap_facility u0 (cost=0.29..108.41 rows=1704 width=4) (actual time=0.013..0.013 rows=1 loops=115)
Index Cond: ("facility_sub_type_id" = ap_facilitysubtype."facility_sub_type_id")
Heap Fetches: 61
Planning Time: 0.674 ms
Execution Time: 1.841 ms
It's not likely to be the correct way to do this but = true can be removed by adding the following to BuiltinLookup.as_sql:
from django.db.models.expressions import Exists
if isinstance(self.lhs, Exists) and self.rhs is True and params == [True]:
return '%s' % (lhs_sql), []
I suspect = true is not necessary in any other database either, so would it make sense to remove it when used with subqueries like this?
Change History (2)
comment:1 by , 6 years ago
| Resolution: | → fixed |
|---|---|
| Status: | new → closed |
| Summary: | Speed up Exists subqueries → Avoid comparison against a boolean value for Exists(). |
comment:2 by , 6 years ago
That works great, thank you!
I installed django from master and can confirm = true isn't there in the sql any more.
It's already fixed on master, see 37e6c5b79bd0529a3c85b8c478e4002fd33a2a1d.