Opened 18 years ago
Last modified 11 years ago
#3575 closed
postgres backend should optimize iexact searches — at Version 21
Reported by: | Owned by: | Philippe Raoult | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | metajack@…, tofu@…, sam@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Currently queries using field__iexact="blah"
translate to field ILIKE 'blah'
. This is really inefficient because ILIKE queries (since they are for pattern matching) cannot be indexed and the whole table must be scanned. If LOWER(field) = LOWER('blah')
is used, it accomplishes the same thing, but an index can be created for LOWER(field) that eliminates the table scan.
This is a little tricky since the default substitution is only on the operator and the right operand, but this patch adds a column substitution array as well. This patch should probably be applied across all backends where LOWER() is supported. Currently it only changes the postgresql_psycopg2 backend.
Change History (26)
by , 18 years ago
Attachment: | django_db.diff added |
---|
comment:1 by , 18 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:2 by , 18 years ago
With the patch there is small issue. We found some double escaping this morning. The attached patch fixes that. If ILIKE was used, the code was escaping for it, and since ILIKE is no longer used for iexact, this needed to be changed.
by , 18 years ago
Attachment: | django_db2.diff added |
---|
additional patch to fix extra escaping for iexact queries
comment:3 by , 17 years ago
Cc: | added |
---|
comment:4 by , 17 years ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
Triage Stage: | Design decision needed → Someday/Maybe |
Can you give me an idea of just how much more efficient we're talking here? For this ticket to be accepted we'll need to figure out which databases are faster with LOWER, and change just those backends. If the speed difference is pretty small, it's probably not quite worth it.
comment:5 by , 17 years ago
ILIKE cannot use indexes, LOWER can. So it is MONSTROUSLY more efficient if the column is indexed.
I don't understand the reservation about this issue. ILIKE is the wrong primitive to use for iexact queries. ILIKE is for case insensitive partial string matching. It seems pretty clear to me, unless there is lots of legacy code making use of this bug by using iexact to do partial string matches. After all, you have mapped equals to =, not LIKE. I'm sure the whole userbase of Django would be complaining quite loudly if equals was mapped to LIKE, as it would effectively make all indexes on strings completely useless.
comment:6 by , 17 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Your patch assumes the data is stored in lowercase, so it would actually break the iexact operator in the general case. If you want to optimize your case, you can store everything as lowercase (overwriting save() would be a good start) in the DB and then use field_exact = tolower(string).
comment:7 by , 17 years ago
Patch needs improvement: | unset |
---|---|
Resolution: | invalid |
Status: | closed → reopened |
Triage Stage: | Someday/Maybe → Design decision needed |
Please read the patch more carefully. We made no such assumption about the case of the data. Please make sure you understand the bug before you go around closing people's hard work as "invalid".
Also, why is this "someday/maybe"? It is an easy patch and it fixes a huge performance problem.
by , 17 years ago
Attachment: | 3575-postgresql-performance.txt added |
---|
example of performance boost on postgresql
comment:8 by , 17 years ago
Summary of that:
Without indexes:
ILIKE: 3900ms
LOWER: 1912ms
With indexes on LOWER(column_name):
ILIKE: 3900ms
LOWER: 108ms
So the performance boost is definitely enormous for this on postgres, and I hope to be able to test MySQL also
comment:9 by , 17 years ago
Oh, should have mentioned that the table in question has over 300,000 rows, so it should be a decent test set ;)
comment:10 by , 17 years ago
Patch needs improvement: | set |
---|
You're absolutely right Jack, I hadn't done my homework. But now I took the time to do it :)
You should check [5519] because it introduced something that looks similar. It's not as generic but I believe you can set a needs_upper_for_iops flag on your backend which will make the queries for iexact look like "WHERE UPPER(column) " + OPERATOR_MAPPING[lookup_type] % cast_sql. It sounds like you could rewrite your patch to make use of this feature and use UPPER instead of LOWER as the mapping for iexact. Other case insensitive operators will probably have to be updated too.
Collin: The current version of the mysql backend does not honor case-sensitiveness (#2170), so make sure your tests do not rely on it.
comment:11 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | reopened → new |
comment:12 by , 17 years ago
Summary: | [patch] iexact searches should use LOWER() not ILIKE → postgres backend should optimize iexact searches |
---|---|
Triage Stage: | Design decision needed → Accepted |
by , 17 years ago
Attachment: | 3575-mysql-performance.txt added |
---|
comment:13 by , 17 years ago
Summary of MySQL:
Without indexes:
LIKE: 0.89s
LOWER: 2.09s
With indexes on column_name: (mysql does not support function indexes, even if it was case sensitive by default)
LIKE: 0.16s
LOWER: 1.83s
Strangely enough, for MySQL using LIKE is faster. Indexed increased the speed slightly for LOWER and greatly for LIKE.
So the LIKE->LOWER conversion definitely wouldn't fit on all backends, but it would definitely help with postgres :)
by , 17 years ago
Attachment: | 3575-upper.patch added |
---|
comment:14 by , 17 years ago
Just a side note, in my app, i've the following table:
create table sometable(
id serial primary key,
label varchar(255)
);
I also have an index created using: CREATE INDEX label_idx ON sometable(upper(label) varchar_pattern_ops);
with the "upper" patch to psycopg backend, my iexact query:
result = SomeTable.objects.filter(labeliexact=query)
Does not use index and makes a seq scan instead
On the other hand the query using LIKE:
select * from sometable where upper(label) LIKE UPPER('sometext')
will use the index (and is very fast..)
I propose to use: 'iexact': 'LIKE UPPER(%s)',
in the proposed patch..
follow-up: 16 comment:15 by , 17 years ago
anon, I think you set the wrong index type - by adding varchar_pattern_ops to your index, you made the index not apply to the normal '='
had you set the index without that, so it used the standard operators, the index would've been used by the patch
comment:16 by , 17 years ago
Replying to Collin Grady <cgrady@the-magi.us>:
anon, I think you set the wrong index type - by adding varchar_pattern_ops to your index, you made the index not apply to the normal '='
had you set the index without that, so it used the standard operators, the index would've been used by the patch
Yep, you are right, after removing varchar_pattern_ops from the index, the query with = UPPER() works with the index without any problems...
Sorry for the misleading comment..
comment:17 by , 17 years ago
Well, Colin, actually I've a subtle comment on the varchar_pattern_ops issue..
When you have the postgres cluster locale set to utf-8, the inexact 'like something%' queries will not use the index, when built using
CREATE INDEX label_idx ON sometable(upper(label));
You have to create second index using
CREATE INDEX label_idx2 ON sometable(upper(label) varchar_pattern_ops);
then postgres will use the first index for exact queries and the second one for the partial / inexact queries..
More on this postgres feature in http://www.postgresql.org/docs/8.2/interactive/indexes-opclass.html
Maybe django postgres backend should handle this somehow..
follow-up: 19 comment:18 by , 17 years ago
Yes, milosu, but as this patch causes the postgresql backends to no longer use ILIKE, the varchar_pattern_ops makes the index not work for the test it is using.
comment:19 by , 17 years ago
Replying to Collin Grady <cgrady@the-magi.us>:
Yes, milosu, but as this patch causes the postgresql backends to no longer use ILIKE, the varchar_pattern_ops makes the index not work for the test it is using.
Ok. You're right, for istartswith and startswith queries you still need the varchar_pattern_ops index, but that's maybe something the developers should take care of themselves and it is not related with this patch..
--
Milos
comment:21 by , 16 years ago
Description: | modified (diff) |
---|
patch to make iexact searches use LOWER() instead of ILIKE