Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#30867 closed Cleanup/optimization (wontfix)

Old indexes should be dropped after new ones are created.

Reported by: Simon Charette Owned by: nobody
Component: Migrations Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When adding unique=True to a field previously index=True field or vice-versa the old index is dropped before the new one is added.

This is not a big issue on backends that support transactional DDL but it is on ones that don't because in between the execution of the two statements the column is not indexed.

For example given the following model

class Author(models.Model):
    name = models.CharField(max_length=50, db_index=True)

Altering name to models.CharField(max_length=50, unique=True) will generate the following SQL on MySQL

DROP INDEX `author_name_idx` ON `author`;
-- Until the following statement completes the name field is not indexed at all.
-- This can take a while on large tables.
ALTER TABLE `author` ADD CONSTRAINT `author_name_uniq` UNIQUE (`name`);

Change History (8)

comment:1 by Claude Paroz, 5 years ago

Why do you see that as a problem? Shouldn't the time between the two commands be rather short? I think that reduced performance is something we could expect during the execution of a migration.

comment:2 by Simon Charette, 5 years ago

Why do you see that as a problem? Shouldn't the time between the two commands be rather short?

Unique constraint creation can be relatively slow on large tables particularly when performed without a lock (e.g. MySQL LOCK=NONE) to allow concurrent read and writes. During that time period not having an index on a frequently queried column can cause such queries to take a a few orders more time to process and thus slowdown the constraint creation time even more. At least that's what we've experienced on a table with 6M rows when adding a unique constraint on a VARCHAR(255) NOT NULL.

I think that reduced performance is something we could expect during the execution of a migration.

Agreed but if we can reduce the performance degradation at the cost of temporarily elevated storage needs I think we should do it.

Is there any reason except for double index storage needs not to do it?

comment:3 by Claude Paroz, 5 years ago

Is there any reason except for double index storage needs not to do it?

No, if we can reasonably estimate that the risk of introducing regressions is low and that it doesn't add too much code complexity.

comment:4 by Claude Paroz, 5 years ago

Triage Stage: UnreviewedAccepted

comment:5 by Claude Paroz, 5 years ago

I would also add that at some point, you should assume the fact you are using a database not supporting transactional DDL.

comment:6 by Simon Charette, 5 years ago

I would also add that at some point, you should assume the fact you are using a database not supporting transactional DDL.

FWIW I'm trying to figure out if it does make a difference on backends that support transactional DDL as well.

For example, it's not completely crazy to assume that PostgreSQL could use an existing index on a column it's trying to create a unique constraint on to speed up the operation given B-trees are used for both implementations. If that's the case deferring the index removal could be beneficial to other backends as well.

comment:7 by Simon Charette, 5 years ago

Resolution: wontfix
Status: newclosed

After a bit of investigation it looks like this would effectively add a lot of complexity to the already complex _alter_field for little benefit because the optimization is only valuable when only performing a field alteration that is only (db_index=True) to (unique=True) migration. If the field type is also meant to be altered (e.g type) we certainly want to drop the index before hand to avoid an internal rebuild with the new type which makes it hard to determine when the optimization should be performed.

For the record I managed to get a noticeable speed up (10-15%) on PostgreSQL when adding a unique constraint to an already indexed column (10M rows integer).

Last edited 5 years ago by Simon Charette (previous) (diff)

comment:8 by Claude Paroz, 5 years ago

Thanks for sharing your experimentations!
"Le mieux est parfois l'ennemi du bien" :-)

Note: See TracTickets for help on using tickets.
Back to Top