Code

Opened 5 years ago

Closed 4 years ago

Last modified 3 years ago

#11017 closed (fixed)

Oracle LIKEC query doesn't use index

Reported by: jtiai Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords: oracle
Cc: mboersma Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

It seems (not confirmed from Oracle officially) LIKEC query doesn't use index at all, you get always full table scan that is costly when number of rows starts to grow.

Like queries uses index as expected with same parameters.

Change was done when fixing #5985

Attachments (1)

index_testing.sql (2.0 KB) - added by jtiai 5 years ago.
Script to test index usage in case of LIKE and LIKEC

Download all attachments as: .zip

Change History (18)

comment:1 Changed 5 years ago by jacob

  • milestone set to 1.2
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

This is too late for 1.1.

comment:2 Changed 5 years ago by ikelly

Can you please supply some documentation or test case that supports this? Is this the case for NVARCHAR2 as well as VARCHAR2?

comment:3 follow-up: Changed 5 years ago by ikelly

Scratch that, I confirmed it myself. I can't get Oracle 10g XE to formulate a plan using an index for LIKEC queries at all.

However, I still need to see some evidence that changing it back to LIKE (and finding another fix for #5985) would be an improvement. The three types of lookups that use LIKEC are contains, startswith, and endswith. contains and endswith lookups always start with '%', so they will never be indexed according to the Oracle docs. startswith lookups take the form 'prefix%' and might theoretically use the index, but in my testing I was unable to find an actual example of that form that did.

comment:4 Changed 5 years ago by mboersma

  • Cc mboersma added

comment:5 in reply to: ↑ 3 ; follow-up: Changed 5 years ago by jtiai

Replying to ikelly:

Scratch that, I confirmed it myself. I can't get Oracle 10g XE to formulate a plan using an index for LIKEC queries at all.

However, I still need to see some evidence that changing it back to LIKE (and finding another fix for #5985) would be an improvement. The three types of lookups that use LIKEC are contains, startswith, and endswith. contains and endswith lookups always start with '%', so they will never be indexed according to the Oracle docs. startswith lookups take the form 'prefix%' and might theoretically use the index, but in my testing I was unable to find an actual example of that form that did.

Both lookups (i)startswith and (i)endswith takes an advantage of index. In case of endswith index must be created with keyword "REVERSE". contains lookups can be only indexed with Oracle Text.

And it doesn't matter is is VARCHAR2 or NVARCHAR2 - no effect.

Small sample how table + indices should be created:

CREATE TABLE TEST_DATA(
 ID NUMBER PRIMARY KEY,
 TXT VARCHAR(200)
);

# Forward index (startswith)
CREATE INDEX TEST_DATA_TXT TEST_DATA(TXT);

# Reverse index (endswith)
CREATE INDEX TEST_DATA_TXT TEST_DATA(TXT) REVERSE;

# Function based uppercase forward index 
# (if istartswith uses UPPER otherwise change to lower accordingly)
CREATE INDEX TEST_DATA_TXT TEST_DATA(UPPER(TXT));

# Function based uppercase reverse index. See previous
CREATE INDEX TEST_DATA_TXT TEST_DATA(UPPER(TXT)) REVERSE;

Now running different queries (except contains should hit indices).

Few things that might affect it:

  • Number of rows is too low.
  • Tables are not analyzed.
  • Too large columns (2000 chars/bytes I think) doesn't get indexed.

If this doesn't get working, I could rise TAR to Oracle Metalink to see is there some issues regading usage of LIKEC (it doesn't seem to be much used anywhere nor documented very well - there might be reason for that)

comment:6 in reply to: ↑ 5 ; follow-up: Changed 5 years ago by ikelly

Replying to jtiai:

In case of endswith index must be created with keyword "REVERSE".

From what I understand (I'm at home right now and don't have an Oracle database handy to actually test on), this doesn't work. Reverse-key indexes are used to provide an optimization for monotonically increasing columns with lots of concurrent inserts, at the cost of not being able to use the index for range scans -- not to allow reverse LIKE queries. See http://richardfoote.wordpress.com/2008/01/16/introduction-to-reverse-key-indexes-part-ii-another-myth-bites-the-dust/

An actual solution (as mentioned in the same article) is to create an index of the REVERSE function. This would require a refactoring of the way the lookups are implemented in Django backends, which is something that should be done anyway.

The suggestion of creating an uppercase index in addition to the regular index is also a useful one, for a separate ticket.

Back on the LIKEC topic, there is no question that a query of the form WHERE TXT LIKE 'foo%' will take advantage of an index under the right circumstances. The main obstacle as I see it is #5985. We weren't able to reproduce that bug ourselves, and so it would be challenging to confirm the effectiveness of an alternative fix. That's why I would like to see an example confirming that LIKE will use an index, but LIKEC does not, before considering this change.

comment:7 in reply to: ↑ 6 Changed 5 years ago by jtiai

Replying to ikelly:

Replying to jtiai:

In case of endswith index must be created with keyword "REVERSE".

From what I understand (I'm at home right now and don't have an Oracle database handy to actually test on), this doesn't work. Reverse-key indexes are used to provide an optimization for monotonically increasing columns with lots of concurrent inserts, at the cost of not being able to use the index for range scans -- not to allow reverse LIKE queries. See http://richardfoote.wordpress.com/2008/01/16/introduction-to-reverse-key-indexes-part-ii-another-myth-bites-the-dust/

Straight from Oracle documentation:
"Creating a reverse key index, compared to a standard index, reverses the bytes of each column indexed (except the rowid) while keeping the column order."

An actual solution (as mentioned in the same article) is to create an index of the REVERSE function. This would require a refactoring of the way the lookups are implemented in Django backends, which is something that should be done anyway.

The suggestion of creating an uppercase index in addition to the regular index is also a useful one, for a separate ticket.

Back on the LIKEC topic, there is no question that a query of the form WHERE TXT LIKE 'foo%' will take advantage of an index under the right circumstances. The main obstacle as I see it is #5985. We weren't able to reproduce that bug ourselves, and so it would be challenging to confirm the effectiveness of an alternative fix. That's why I would like to see an example confirming that LIKE will use an index, but LIKEC does not, before considering this change.

All of the above are optimizations that could be done. I'm using legacy application so in my particular case indices are already there. I'm hitting this problem because I've "small" address table - around 21 million rows. There you clearly see when query hits index and when they do not - and it applies to performance as well.

I try to produce sample set of test data and script that would demonstrate index usage. Note that performance problem is easily hidden with powerful hardware.

Changed 5 years ago by jtiai

Script to test index usage in case of LIKE and LIKEC

comment:8 follow-ups: Changed 5 years ago by ikelly

Thanks for the test case. This confirms that the problem happens at least with VARCHAR2 columns. However, when I change the column type to NVARCHAR2, I get full table scans for both queries. I may have something misconfigured -- it seems bizarre that this sort of query would ignore the index just because the column is NVARCHAR2.

To test the reverse-key index suggestion, I also changed the original script by adding a reverse-key index and changing the search strings to '%Alley'. The result: Oracle wouldn't let me create both a normal and reverse-key index on the same column. So I deleted the normal index and tried again, and I got full table scans for both queries.

comment:9 in reply to: ↑ 8 Changed 5 years ago by jtiai

Replying to ikelly:

Thanks for the test case. This confirms that the problem happens at least with VARCHAR2 columns. However, when I change the column type to NVARCHAR2, I get full table scans for both queries. I may have something misconfigured -- it seems bizarre that this sort of query would ignore the index just because the column is NVARCHAR2.

I recall that there was some "catch" again here. Orace is so delicate what comes to indices.

To test the reverse-key index suggestion, I also changed the original script by adding a reverse-key index and changing the search strings to '%Alley'. The result: Oracle wouldn't let me create both a normal and reverse-key index on the same column. So I deleted the normal index and tried again, and I got full table scans for both queries.

There was some "catch" to make reverses working - it's not main concern right now anyway.

comment:10 in reply to: ↑ 8 Changed 5 years ago by jtiai

Replying to ikelly:

Thanks for the test case. This confirms that the problem happens at least with VARCHAR2 columns. However, when I change the column type to NVARCHAR2, I get full table scans for both queries. I may have something misconfigured -- it seems bizarre that this sort of query would ignore the index just because the column is NVARCHAR2.

Full table scan was chosen by oracle optimizer because there was too few rows. Change that most inner loop for c in 1..100 to something like 10000 and you start getting index usage with NVARCHAR2 too. Still it doesn't fix LIKEC problem...

Of course I always can fallback to RAW SQL here.

comment:11 Changed 5 years ago by ctao

LIKEC runs terribly slow in django for my case. The same sql runs in Toad about 6-7 sec, but in Django about 30 - 32 sec.

comment:12 Changed 5 years ago by jtiai

It seems that there is some bug in Oracle 9.2.0.5 and previous versions that didn't correctly handled escape. It's been fixed in later patchsets.

Because original report in #5985 didn't state which version Oracle was used it is hard to tell was it Oracle bug or something else.

Known workaround for Oracle versions 9.2.0.5 and earlier is to use:

LIKE TRANSLATE(%s USING NCHAR_CS) ESCAPE TRANSLATE('\\' USING NCHAR_CS)

comment:13 Changed 5 years ago by jtiai

Quick note: we reverted changes made at [7412] and are now testing any side-effects that might have caused.

comment:14 Changed 5 years ago by jtiai

Status update: We've been using reverted version for a good while and no anomalies have been occuring. Indices gets used as expected. My suggestion is just to revert changes made in [7412].

comment:15 Changed 4 years ago by ikelly

Thanks for tracking down that bug. I'm fairly sure that Frank was using Oracle 9 when he reported #5985, so this is quite likely the same bug. I'm reluctant to just revert the change since that would cause a regression of #5985, but I'm definitely willing to consider the workaround. I also wonder if it might be fixed a bit more efficiently by:

LIKE %s ESCAPE N'\\'

Unfortunately, I don't have a 9.2.0.5 database available to test on. Do you?

comment:16 Changed 4 years ago by ikelly

  • Resolution set to fixed
  • Status changed from new to closed

(In [12293]) Fixed #11017: Changed implementation of Oracle LIKE lookups to enable index usage. Thanks to Jani Tiainen for this workaround.

comment:17 Changed 3 years ago by jacob

  • milestone 1.2 deleted

Milestone 1.2 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.