Opened 3 years ago

Closed 3 years ago

#18082 closed Bug (fixed)

get_indexes produces wrong resuls under oracle

Reported by: akaariai Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords: oracle
Cc: anssi.kaariainen@… Triage Stage: Ready for checkin
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently oracle/introspection.py get_indexes uses this SQL:

SELECT LOWER(all_tab_cols.column_name) AS column_name,
       CASE user_constraints.constraint_type
           WHEN 'P' THEN 1 ELSE 0
       END AS is_primary_key,
       CASE user_indexes.uniqueness
           WHEN 'UNIQUE' THEN 1 ELSE 0
       END AS is_unique
FROM   all_tab_cols, user_cons_columns, user_constraints, user_ind_columns, user_indexes
WHERE  all_tab_cols.column_name = user_cons_columns.column_name (+)
  AND  all_tab_cols.table_name = user_cons_columns.table_name (+)
  AND  user_cons_columns.constraint_name = user_constraints.constraint_name (+)
  AND  user_constraints.constraint_type (+) = 'P'
  AND  user_ind_columns.column_name (+) = all_tab_cols.column_name
  AND  user_ind_columns.table_name (+) = all_tab_cols.table_name
  AND  user_indexes.uniqueness (+) = 'UNIQUE'
  AND  user_indexes.index_name (+) = user_ind_columns.index_name
  AND  all_tab_cols.table_name = UPPER(%s)

This SQL has two failings: first, it LEFT JOINs instead of INNER JOINs the user_ind_columns, so every column will show as indexed. In addition the results are ordering dependant, for example if we check AUTH_USER by that SQL, the result is:

id	        1	1
id	        0	1
username	0	1
is_staff	0	0
is_staff	0	0
last_name	0	0
is_superuser	0	0
is_superuser	0	0
last_login	0	0
date_joined	0	0
first_name	0	0
email	        0	0
is_active	0	0
is_active	0	0
password	0	0

So, if ID is a primary key is purely luck-based.

In addition the SQL is really slow, it takes around 0.9 seconds for single table on my laptop. As the test suite has nearly 1000 tables, inspectdb tests will take half an hour just executing this SQL.

I think one could get the same results by using the three following queries:

select * from user_ind_columns where table_name = 'AUTH_GROUP';
select * from user_constraints where constraint_type = 'P' and table_name = 'AUTH_GROUP';
select * from user_indexes where uniqueness = 'UNIQUE' and table_name = 'AUTH_GROUP';

and then doing the joins in Python. It seems this approach would also be at least an order of magnitude faster.

Attachments (1)

18082.diff (4.0 KB) - added by akaariai 3 years ago.

Download all attachments as: .zip

Change History (6)

comment:1 Changed 3 years ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Part of the problem comes from having multiple indexes on the same column (for example unique on firstname, lastname + a normal index on firstname). In that case it was ordering dependent if the column was seen as unique or not.

The attached patch simplifies the SQL, removes non-indexed columns from the output and in addition handles the multiple indexes for the same column by using GROUP BY + MAX.

I was wrong about the get_indexes calls taking a lot of time in the test suite. The first call takes around a second, but after that the query plan is cached and subsequent calls to get_indexes will be fast enough to not be a bottleneck.

The introspection + inspectdb tests pass on Oracle 11g.

Version 0, edited 3 years ago by akaariai (next)

Changed 3 years ago by akaariai

comment:2 Changed 3 years ago by ikelly

  • Triage Stage changed from Unreviewed to Accepted

The only comment I have is that it column_position = 1 is not sufficient to determine whether the index is multi-column or not. We would also need to check that there are no rows for the same index with column_position > 1.

comment:3 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added

I checked and PostgreSQL skips all multi-column indexes. So, I guess another join on user_ind_columns checking column_position = 2 does not exists would work. It would hopefully remove also the need for the ugly GROUP BY + MAX hack.

comment:4 Changed 3 years ago by akaariai

  • Triage Stage changed from Accepted to Ready for checkin

I have created pull request https://github.com/django/django/pull/23 which contains ready-for-checkin patch for this. I will commit this in a day or two unless something is found (or somebody does that before me).

comment:5 Changed 3 years ago by akaariai

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.
Back to Top