Opened 13 years ago
Closed 13 years ago
#18082 closed Bug (fixed)
get_indexes produces wrong resuls under oracle
Reported by: | Anssi Kääriäinen | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4 |
Severity: | Normal | Keywords: | oracle |
Cc: | anssi.kaariainen@… | Triage Stage: | Ready for checkin |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Currently oracle/introspection.py get_indexes uses this SQL:
SELECT LOWER(all_tab_cols.column_name) AS column_name, CASE user_constraints.constraint_type WHEN 'P' THEN 1 ELSE 0 END AS is_primary_key, CASE user_indexes.uniqueness WHEN 'UNIQUE' THEN 1 ELSE 0 END AS is_unique FROM all_tab_cols, user_cons_columns, user_constraints, user_ind_columns, user_indexes WHERE all_tab_cols.column_name = user_cons_columns.column_name (+) AND all_tab_cols.table_name = user_cons_columns.table_name (+) AND user_cons_columns.constraint_name = user_constraints.constraint_name (+) AND user_constraints.constraint_type (+) = 'P' AND user_ind_columns.column_name (+) = all_tab_cols.column_name AND user_ind_columns.table_name (+) = all_tab_cols.table_name AND user_indexes.uniqueness (+) = 'UNIQUE' AND user_indexes.index_name (+) = user_ind_columns.index_name AND all_tab_cols.table_name = UPPER(%s)
This SQL has two failings: first, it LEFT JOINs instead of INNER JOINs the user_ind_columns, so every column will show as indexed. In addition the results are ordering dependant, for example if we check AUTH_USER by that SQL, the result is:
id 1 1 id 0 1 username 0 1 is_staff 0 0 is_staff 0 0 last_name 0 0 is_superuser 0 0 is_superuser 0 0 last_login 0 0 date_joined 0 0 first_name 0 0 email 0 0 is_active 0 0 is_active 0 0 password 0 0
So, if ID is a primary key is purely luck-based.
In addition the SQL is really slow, it takes around 0.9 seconds for single table on my laptop. As the test suite has nearly 1000 tables, inspectdb tests will take half an hour just executing this SQL.
I think one could get the same results by using the three following queries:
select * from user_ind_columns where table_name = 'AUTH_GROUP'; select * from user_constraints where constraint_type = 'P' and table_name = 'AUTH_GROUP'; select * from user_indexes where uniqueness = 'UNIQUE' and table_name = 'AUTH_GROUP';
and then doing the joins in Python. It seems this approach would also be at least an order of magnitude faster.
Attachments (1)
Change History (6)
by , 13 years ago
Attachment: | 18082.diff added |
---|
comment:2 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
The only comment I have is that it column_position = 1
is not sufficient to determine whether the index is multi-column or not. We would also need to check that there are no rows for the same index with column_position > 1
.
comment:3 by , 13 years ago
Cc: | added |
---|
I checked and PostgreSQL skips all multi-column indexes. So, I guess another join on user_ind_columns checking column_position = 2 does not exists would work. It would hopefully remove also the need for the ugly GROUP BY + MAX hack.
comment:4 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I have created pull request https://github.com/django/django/pull/23 which contains ready-for-checkin patch for this. I will commit this in a day or two unless something is found (or somebody does that before me).
comment:5 by , 13 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Part of the problem comes from having multiple indexes on the same column (for example unique on firstname, lastname + a normal index on firstname). In that case it was ordering dependent if the column was seen as unique or not.
The attached patch simplifies the SQL, removes non-indexed columns from the output and in addition handles the multiple indexes for the same column by using GROUP BY + MAX.
I was wrong about the get_indexes calls taking a lot of time in the test suite. The first call takes around a second, but after that the query plan is cached and subsequent calls to get_indexes will be fast enough to not be a bottleneck.
The introspection + inspectdb tests pass on Oracle 11g.