Opened 18 years ago

Last modified 11 years ago

#3575 closed

postgres backend should optimize iexact searches — at Version 21

Reported by: Jack Moffitt <metajack@…> Owned by: Philippe Raoult
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: metajack@…, tofu@…, sam@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Ramiro Morales)

Currently queries using field__iexact="blah" translate to field ILIKE 'blah'. This is really inefficient because ILIKE queries (since they are for pattern matching) cannot be indexed and the whole table must be scanned. If LOWER(field) = LOWER('blah') is used, it accomplishes the same thing, but an index can be created for LOWER(field) that eliminates the table scan.

This is a little tricky since the default substitution is only on the operator and the right operand, but this patch adds a column substitution array as well. This patch should probably be applied across all backends where LOWER() is supported. Currently it only changes the postgresql_psycopg2 backend.

Change History (26)

by Jack Moffitt <metajack@…>, 18 years ago

Attachment: django_db.diff added

patch to make iexact searches use LOWER() instead of ILIKE

comment:1 by Simon G. <dev@…>, 18 years ago

Triage Stage: UnreviewedDesign decision needed

comment:2 by Jack Moffitt <metajack@…>, 18 years ago

With the patch there is small issue. We found some double escaping this morning. The attached patch fixes that. If ILIKE was used, the code was escaping for it, and since ILIKE is no longer used for iexact, this needed to be changed.

by Jack Moffitt <metajack@…>, 18 years ago

Attachment: django_db2.diff added

additional patch to fix extra escaping for iexact queries

comment:3 by anonymous, 17 years ago

Cc: sam@… added

comment:4 by Jacob, 17 years ago

Needs tests: set
Patch needs improvement: set
Triage Stage: Design decision neededSomeday/Maybe

Can you give me an idea of just how much more efficient we're talking here? For this ticket to be accepted we'll need to figure out which databases are faster with LOWER, and change just those backends. If the speed difference is pretty small, it's probably not quite worth it.

comment:5 by Jack Moffitt <metajack@…>, 17 years ago

ILIKE cannot use indexes, LOWER can. So it is MONSTROUSLY more efficient if the column is indexed.

I don't understand the reservation about this issue. ILIKE is the wrong primitive to use for iexact queries. ILIKE is for case insensitive partial string matching. It seems pretty clear to me, unless there is lots of legacy code making use of this bug by using iexact to do partial string matches. After all, you have mapped equals to =, not LIKE. I'm sure the whole userbase of Django would be complaining quite loudly if equals was mapped to LIKE, as it would effectively make all indexes on strings completely useless.

comment:6 by Philippe Raoult, 17 years ago

Resolution: invalid
Status: newclosed

Your patch assumes the data is stored in lowercase, so it would actually break the iexact operator in the general case. If you want to optimize your case, you can store everything as lowercase (overwriting save() would be a good start) in the DB and then use field_exact = tolower(string).

comment:7 by Jack Moffitt <metajack@…>, 17 years ago

Patch needs improvement: unset
Resolution: invalid
Status: closedreopened
Triage Stage: Someday/MaybeDesign decision needed

Please read the patch more carefully. We made no such assumption about the case of the data. Please make sure you understand the bug before you go around closing people's hard work as "invalid".

Also, why is this "someday/maybe"? It is an easy patch and it fixes a huge performance problem.

by Collin Grady <cgrady@…>, 17 years ago

example of performance boost on postgresql

comment:8 by Collin Grady <cgrady@…>, 17 years ago

Summary of that:

Without indexes:

ILIKE: 3900ms
LOWER: 1912ms

With indexes on LOWER(column_name):

ILIKE: 3900ms
LOWER: 108ms

So the performance boost is definitely enormous for this on postgres, and I hope to be able to test MySQL also

comment:9 by Collin Grady <cgrady@…>, 17 years ago

Oh, should have mentioned that the table in question has over 300,000 rows, so it should be a decent test set ;)

comment:10 by Philippe Raoult, 17 years ago

Patch needs improvement: set

You're absolutely right Jack, I hadn't done my homework. But now I took the time to do it :)

You should check [5519] because it introduced something that looks similar. It's not as generic but I believe you can set a needs_upper_for_iops flag on your backend which will make the queries for iexact look like "WHERE UPPER(column) " + OPERATOR_MAPPING[lookup_type] % cast_sql. It sounds like you could rewrite your patch to make use of this feature and use UPPER instead of LOWER as the mapping for iexact. Other case insensitive operators will probably have to be updated too.

Collin: The current version of the mysql backend does not honor case-sensitiveness (#2170), so make sure your tests do not rely on it.

comment:11 by Philippe Raoult, 17 years ago

Owner: changed from nobody to Philippe Raoult
Status: reopenednew

comment:12 by Philippe Raoult, 17 years ago

Summary: [patch] iexact searches should use LOWER() not ILIKEpostgres backend should optimize iexact searches
Triage Stage: Design decision neededAccepted

by Collin Grady, 17 years ago

Attachment: 3575-mysql-performance.txt added

comment:13 by Collin Grady, 17 years ago

Summary of MySQL:

Without indexes:

LIKE: 0.89s
LOWER: 2.09s

With indexes on column_name: (mysql does not support function indexes, even if it was case sensitive by default)

LIKE: 0.16s
LOWER: 1.83s

Strangely enough, for MySQL using LIKE is faster. Indexed increased the speed slightly for LOWER and greatly for LIKE.

So the LIKE->LOWER conversion definitely wouldn't fit on all backends, but it would definitely help with postgres :)

by Collin Grady, 17 years ago

Attachment: 3575-upper.patch added

comment:14 by anonymous, 17 years ago

Just a side note, in my app, i've the following table:

create table sometable(

id serial primary key,
label varchar(255)

);

I also have an index created using: CREATE INDEX label_idx ON sometable(upper(label) varchar_pattern_ops);

with the "upper" patch to psycopg backend, my iexact query:
result = SomeTable.objects.filter(labeliexact=query)
Does not use index and makes a seq scan instead


On the other hand the query using LIKE:
select * from sometable where upper(label) LIKE UPPER('sometext')


will use the index (and is very fast..)

I propose to use: 'iexact': 'LIKE UPPER(%s)',

in the proposed patch..

comment:15 by Collin Grady <cgrady@…>, 17 years ago

anon, I think you set the wrong index type - by adding varchar_pattern_ops to your index, you made the index not apply to the normal '='

had you set the index without that, so it used the standard operators, the index would've been used by the patch

in reply to:  15 comment:16 by anonymous, 17 years ago

Replying to Collin Grady <cgrady@the-magi.us>:

anon, I think you set the wrong index type - by adding varchar_pattern_ops to your index, you made the index not apply to the normal '='

had you set the index without that, so it used the standard operators, the index would've been used by the patch

Yep, you are right, after removing varchar_pattern_ops from the index, the query with = UPPER() works with the index without any problems...

Sorry for the misleading comment..

comment:17 by milosu, 17 years ago

Well, Colin, actually I've a subtle comment on the varchar_pattern_ops issue..

When you have the postgres cluster locale set to utf-8, the inexact 'like something%' queries will not use the index, when built using

CREATE INDEX label_idx ON sometable(upper(label));

You have to create second index using
CREATE INDEX label_idx2 ON sometable(upper(label) varchar_pattern_ops);

then postgres will use the first index for exact queries and the second one for the partial / inexact queries..

More on this postgres feature in http://www.postgresql.org/docs/8.2/interactive/indexes-opclass.html

Maybe django postgres backend should handle this somehow..

comment:18 by Collin Grady <cgrady@…>, 17 years ago

Yes, milosu, but as this patch causes the postgresql backends to no longer use ILIKE, the varchar_pattern_ops makes the index not work for the test it is using.

in reply to:  18 comment:19 by milosu, 17 years ago

Replying to Collin Grady <cgrady@the-magi.us>:

Yes, milosu, but as this patch causes the postgresql backends to no longer use ILIKE, the varchar_pattern_ops makes the index not work for the test it is using.

Ok. You're right, for istartswith and startswith queries you still need the varchar_pattern_ops index, but that's maybe something the developers should take care of themselves and it is not related with this patch..

--
Milos

comment:20 by Jack Moffitt <metajack@…>, 16 years ago

This patch has been here for a year. How do we get this applied?

comment:21 by Ramiro Morales, 16 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top