Opened 4 years ago
Last modified 4 years ago
#32485 closed New feature
Django ORM icontains Operator Performance and Response Problem — at Version 1
Reported by: | Mesut Öncel | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.1 |
Severity: | Normal | Keywords: | Django ORM, icontains, ilike, performance tuning |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
For https://github.com/django/django/blob/master/django/db/backends/postgresql/operations.py#L103
There are 2 major disadvantages of using UPPER instead of buttonhole in Django ORM "icontains" operator. First of all, I would like to talk about the performance problem. If you add an expression to an index set for the "like" operator, index will not be used.
Query performance when we create a standard index for the like / ilike operator and do not use upper.
postgres=# CREATE INDEX trgm_idx_test_name ON test USING gin (name gin_trgm_ops); CREATE INDEX postgres=# explain analyze select name from test where name ilike '%aXxA%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on test (cost=37.16..574.45 rows=150 width=16) (actual time=0.314..0.469 rows=27 loops=1) Recheck Cond: ((name)::text ~~* '%aXxA%'::text) Rows Removed by Index Recheck: 1 Heap Blocks: exact=28 -> Bitmap Index Scan on trgm_idx_test_name (cost=0.00..37.12 rows=150 width=0) (actual time=0.268..0.269 rows=28 loops=1) Index Cond: ((name)::text ~~* '%aXxA%'::text) Planning Time: 0.374 ms Execution Time: 0.507 ms (8 rows)
Query performance also when using Upper method.
postgres=# explain analyze select name from test where UPPER(name) like '%AAXXXA%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Gather (cost=1000.00..18532.00 rows=480 width=16) (actual time=260.902..262.631 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on test (cost=0.00..17484.00 rows=200 width=16) (actual time=253.644..253.644 rows=0 loops=3) Filter: (upper((name)::text) ~~ '%AAXXXA%'::text) Rows Removed by Filter: 500000 Planning Time: 0.082 ms Execution Time: 262.653 ms (8 rows)
If the UPPER method is desired to be used, the result of expression index query performance.
postgres=# CREATE INDEX trgm_idx_test_name ON test USING gin (UPPER(name) gin_trgm_ops); CREATE INDEX postgres=# explain analyze select name from test where UPPER(name) like '%AAXXXA%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on test (cost=71.72..1610.71 rows=480 width=16) (actual time=0.459..0.460 rows=0 loops=1) Recheck Cond: (upper((name)::text) ~~ '%AAXXXA%'::text) -> Bitmap Index Scan on trgm_idx_test_name (cost=0.00..71.60 rows=480 width=0) (actual time=0.457..0.457 rows=0 loops=1) Index Cond: (upper((name)::text) ~~ '%AAXXXA%'::text) Planning Time: 0.340 ms Execution Time: 0.497 ms (6 rows)
These analyzes show that when using the "icontains" operator, "ilike" should be used instead of "UPPER like". Those who use such information outside the framework of the database should not use "ilike". Database administrators will not create an index specific to this structure. Because they need to create both expression index and standard index. This causes both cost and performance problems.
The other problem is that the results are wrong because it uses UPPER. I would like to prove this with a query like this.
postgres=# select name from test where UPPER(name) like UPPER('%istanbul%'); name ------ (0 rows) postgres=# select name from test where name ilike '%istanbul%'; name ---------- İstanbul (1 row)