#30867 closed Cleanup/optimization (wontfix)
Old indexes should be dropped after new ones are created.
Reported by: | Simon Charette | Owned by: | nobody |
---|---|---|---|
Component: | Migrations | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When adding unique=True
to a field previously index=True
field or vice-versa the old index is dropped before the new one is added.
This is not a big issue on backends that support transactional DDL but it is on ones that don't because in between the execution of the two statements the column is not indexed.
For example given the following model
class Author(models.Model): name = models.CharField(max_length=50, db_index=True)
Altering name
to models.CharField(max_length=50, unique=True)
will generate the following SQL on MySQL
DROP INDEX `author_name_idx` ON `author`; -- Until the following statement completes the name field is not indexed at all. -- This can take a while on large tables. ALTER TABLE `author` ADD CONSTRAINT `author_name_uniq` UNIQUE (`name`);
Change History (8)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
Why do you see that as a problem? Shouldn't the time between the two commands be rather short?
Unique constraint creation can be relatively slow on large tables particularly when performed without a lock (e.g. MySQL LOCK=NONE
) to allow concurrent read and writes. During that time period not having an index on a frequently queried column can cause such queries to take a a few orders more time to process and thus slowdown the constraint creation time even more. At least that's what we've experienced on a table with 6M rows when adding a unique constraint on a VARCHAR(255) NOT NULL
.
I think that reduced performance is something we could expect during the execution of a migration.
Agreed but if we can reduce the performance degradation at the cost of temporarily elevated storage needs I think we should do it.
Is there any reason except for double index storage needs not to do it?
comment:3 by , 5 years ago
Is there any reason except for double index storage needs not to do it?
No, if we can reasonably estimate that the risk of introducing regressions is low and that it doesn't add too much code complexity.
comment:4 by , 5 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:5 by , 5 years ago
I would also add that at some point, you should assume the fact you are using a database not supporting transactional DDL.
comment:6 by , 5 years ago
I would also add that at some point, you should assume the fact you are using a database not supporting transactional DDL.
FWIW I'm trying to figure out if it does make a difference on backends that support transactional DDL as well.
For example, it's not completely crazy to assume that PostgreSQL could use an existing index on a column it's trying to create a unique constraint on to speed up the operation given B-trees are used for both implementations. If that's the case deferring the index removal could be beneficial to other backends as well.
comment:7 by , 5 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
After a bit of investigation it looks like this would effectively add a lot of complexity to the already complex _alter_field
for little benefit because the optimization is only valuable when only performing a (db_index=True)
to (unique=True)
migration. If the field type is also meant to be altered (e.g type) we certainly want to drop the index before hand to avoid an internal rebuild with the new type which makes it hard to determine when the optimization should be performed.
For the record I managed to get a noticeable speed up (10-15%) on PostgreSQL when adding a unique constraint to an already indexed column (10M rows integer).
comment:8 by , 5 years ago
Thanks for sharing your experimentations!
"Le mieux est parfois l'ennemi du bien" :-)
Why do you see that as a problem? Shouldn't the time between the two commands be rather short? I think that reduced performance is something we could expect during the execution of a migration.