Code

Opened 7 years ago

Closed 3 months ago

#3575 closed Cleanup/optimization (fixed)

postgres backend should optimize iexact searches

Reported by: Jack Moffitt <metajack@…> Owned by: jacob
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: metajack@…, tofu@…, sam@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by ramiro)

Currently queries using field__iexact="blah" translate to field ILIKE 'blah'. This is really inefficient because ILIKE queries (since they are for pattern matching) cannot be indexed and the whole table must be scanned. If LOWER(field) = LOWER('blah') is used, it accomplishes the same thing, but an index can be created for LOWER(field) that eliminates the table scan.

This is a little tricky since the default substitution is only on the operator and the right operand, but this patch adds a column substitution array as well. This patch should probably be applied across all backends where LOWER() is supported. Currently it only changes the postgresql_psycopg2 backend.

Attachments (5)

django_db.diff (1.5 KB) - added by Jack Moffitt <metajack@…> 7 years ago.
patch to make iexact searches use LOWER() instead of ILIKE
django_db2.diff (1.5 KB) - added by Jack Moffitt <metajack@…> 7 years ago.
additional patch to fix extra escaping for iexact queries
3575-postgresql-performance.txt (4.7 KB) - added by Collin Grady <cgrady@…> 6 years ago.
example of performance boost on postgresql
3575-mysql-performance.txt (1.9 KB) - added by cgrady 6 years ago.
3575-upper.patch (2.2 KB) - added by cgrady 6 years ago.

Download all attachments as: .zip

Change History (33)

Changed 7 years ago by Jack Moffitt <metajack@…>

patch to make iexact searches use LOWER() instead of ILIKE

comment:1 Changed 7 years ago by Simon G. <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

comment:2 Changed 7 years ago by Jack Moffitt <metajack@…>

With the patch there is small issue. We found some double escaping this morning. The attached patch fixes that. If ILIKE was used, the code was escaping for it, and since ILIKE is no longer used for iexact, this needed to be changed.

Changed 7 years ago by Jack Moffitt <metajack@…>

additional patch to fix extra escaping for iexact queries

comment:3 Changed 7 years ago by anonymous

  • Cc sam@… added

comment:4 Changed 6 years ago by jacob

  • Needs tests set
  • Patch needs improvement set
  • Triage Stage changed from Design decision needed to Someday/Maybe

Can you give me an idea of just how much more efficient we're talking here? For this ticket to be accepted we'll need to figure out which databases are faster with LOWER, and change just those backends. If the speed difference is pretty small, it's probably not quite worth it.

comment:5 Changed 6 years ago by Jack Moffitt <metajack@…>

ILIKE cannot use indexes, LOWER can. So it is MONSTROUSLY more efficient if the column is indexed.

I don't understand the reservation about this issue. ILIKE is the wrong primitive to use for iexact queries. ILIKE is for case insensitive partial string matching. It seems pretty clear to me, unless there is lots of legacy code making use of this bug by using iexact to do partial string matches. After all, you have mapped equals to =, not LIKE. I'm sure the whole userbase of Django would be complaining quite loudly if equals was mapped to LIKE, as it would effectively make all indexes on strings completely useless.

comment:6 Changed 6 years ago by PhiR

  • Resolution set to invalid
  • Status changed from new to closed

Your patch assumes the data is stored in lowercase, so it would actually break the iexact operator in the general case. If you want to optimize your case, you can store everything as lowercase (overwriting save() would be a good start) in the DB and then use field_exact = tolower(string).

comment:7 Changed 6 years ago by Jack Moffitt <metajack@…>

  • Patch needs improvement unset
  • Resolution invalid deleted
  • Status changed from closed to reopened
  • Triage Stage changed from Someday/Maybe to Design decision needed

Please read the patch more carefully. We made no such assumption about the case of the data. Please make sure you understand the bug before you go around closing people's hard work as "invalid".

Also, why is this "someday/maybe"? It is an easy patch and it fixes a huge performance problem.

Changed 6 years ago by Collin Grady <cgrady@…>

example of performance boost on postgresql

comment:8 Changed 6 years ago by Collin Grady <cgrady@…>

Summary of that:

Without indexes:

ILIKE: 3900ms
LOWER: 1912ms

With indexes on LOWER(column_name):

ILIKE: 3900ms
LOWER: 108ms

So the performance boost is definitely enormous for this on postgres, and I hope to be able to test MySQL also

comment:9 Changed 6 years ago by Collin Grady <cgrady@…>

Oh, should have mentioned that the table in question has over 300,000 rows, so it should be a decent test set ;)

comment:10 Changed 6 years ago by PhiR

  • Patch needs improvement set

You're absolutely right Jack, I hadn't done my homework. But now I took the time to do it :)

You should check [5519] because it introduced something that looks similar. It's not as generic but I believe you can set a needs_upper_for_iops flag on your backend which will make the queries for iexact look like "WHERE UPPER(column) " + OPERATOR_MAPPING[lookup_type] % cast_sql. It sounds like you could rewrite your patch to make use of this feature and use UPPER instead of LOWER as the mapping for iexact. Other case insensitive operators will probably have to be updated too.

Collin: The current version of the mysql backend does not honor case-sensitiveness (#2170), so make sure your tests do not rely on it.

comment:11 Changed 6 years ago by PhiR

  • Owner changed from nobody to PhiR
  • Status changed from reopened to new

comment:12 Changed 6 years ago by PhiR

  • Summary changed from [patch] iexact searches should use LOWER() not ILIKE to postgres backend should optimize iexact searches
  • Triage Stage changed from Design decision needed to Accepted

Changed 6 years ago by cgrady

comment:13 Changed 6 years ago by cgrady

Summary of MySQL:

Without indexes:

LIKE: 0.89s
LOWER: 2.09s

With indexes on column_name: (mysql does not support function indexes, even if it was case sensitive by default)

LIKE: 0.16s
LOWER: 1.83s

Strangely enough, for MySQL using LIKE is faster. Indexed increased the speed slightly for LOWER and greatly for LIKE.

So the LIKE->LOWER conversion definitely wouldn't fit on all backends, but it would definitely help with postgres :)

Changed 6 years ago by cgrady

comment:14 Changed 6 years ago by anonymous

Just a side note, in my app, i've the following table:

create table sometable(

id serial primary key,
label varchar(255)

);

I also have an index created using: CREATE INDEX label_idx ON sometable(upper(label) varchar_pattern_ops);

with the "upper" patch to psycopg backend, my iexact query:
result = SomeTable.objects.filter(labeliexact=query)
Does not use index and makes a seq scan instead


On the other hand the query using LIKE:
select * from sometable where upper(label) LIKE UPPER('sometext')


will use the index (and is very fast..)

I propose to use: 'iexact': 'LIKE UPPER(%s)',

in the proposed patch..

comment:15 follow-up: Changed 6 years ago by Collin Grady <cgrady@…>

anon, I think you set the wrong index type - by adding varchar_pattern_ops to your index, you made the index not apply to the normal '='

had you set the index without that, so it used the standard operators, the index would've been used by the patch

comment:16 in reply to: ↑ 15 Changed 6 years ago by anonymous

Replying to Collin Grady <cgrady@the-magi.us>:

anon, I think you set the wrong index type - by adding varchar_pattern_ops to your index, you made the index not apply to the normal '='

had you set the index without that, so it used the standard operators, the index would've been used by the patch

Yep, you are right, after removing varchar_pattern_ops from the index, the query with = UPPER() works with the index without any problems...

Sorry for the misleading comment..

comment:17 Changed 6 years ago by milosu

Well, Colin, actually I've a subtle comment on the varchar_pattern_ops issue..

When you have the postgres cluster locale set to utf-8, the inexact 'like something%' queries will not use the index, when built using

CREATE INDEX label_idx ON sometable(upper(label));

You have to create second index using
CREATE INDEX label_idx2 ON sometable(upper(label) varchar_pattern_ops);

then postgres will use the first index for exact queries and the second one for the partial / inexact queries..

More on this postgres feature in http://www.postgresql.org/docs/8.2/interactive/indexes-opclass.html

Maybe django postgres backend should handle this somehow..

comment:18 follow-up: Changed 6 years ago by Collin Grady <cgrady@…>

Yes, milosu, but as this patch causes the postgresql backends to no longer use ILIKE, the varchar_pattern_ops makes the index not work for the test it is using.

comment:19 in reply to: ↑ 18 Changed 6 years ago by milosu

Replying to Collin Grady <cgrady@the-magi.us>:

Yes, milosu, but as this patch causes the postgresql backends to no longer use ILIKE, the varchar_pattern_ops makes the index not work for the test it is using.

Ok. You're right, for istartswith and startswith queries you still need the varchar_pattern_ops index, but that's maybe something the developers should take care of themselves and it is not related with this patch..

--
Milos

comment:20 Changed 6 years ago by Jack Moffitt <metajack@…>

This patch has been here for a year. How do we get this applied?

comment:21 Changed 6 years ago by ramiro

  • Description modified (diff)

comment:22 Changed 6 years ago by mtredinnick

  • milestone set to 1.0 maybe

comment:23 Changed 6 years ago by cgrady

  • Owner changed from PhiR to cgrady

comment:24 Changed 6 years ago by jacob

  • milestone changed from 1.0 maybe to 1.0
  • Owner changed from cgrady to jacob
  • Status changed from new to assigned

comment:25 Changed 6 years ago by jacob

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [8536]) Fixed #3575: use UPPER() instead ILIKE for postgres case-insensitive comparisons.

comment:26 Changed 3 years ago by jacob

  • milestone 1.0 deleted

Milestone 1.0 deleted

comment:27 Changed 3 months ago by shmishleniy@…

  • Easy pickings unset
  • Resolution fixed deleted
  • Severity set to Normal
  • Status changed from closed to new
  • Type set to Cleanup/optimization
  • UI/UX unset

Hi!
Not so far good to provide "out the box" solution to replace ILIKE on UPPER or LOWER.
You guys forced us to override standard backends.
The are so many cases when the best solution is use ILIKE.
Two pic to quick example on small TEST db:

  1. Using UPPER

http://s23.postimg.org/9qatun20b/upper.png

  1. Using ILIKE

http://s22.postimg.org/vdq1d3q75/ilike.png
Yes of course we can create UPPER index but it's too expensive to have UPPER and norml case indexes.

comment:28 Changed 3 months ago by timo

  • Resolution set to fixed
  • Status changed from new to closed

Please don't reopen tickets that have been closed for 5 years. If you wish to suggest an improvement, please open a new ticket. You can mention this ticket in the description of that ticket. Thanks.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.