Opened 3 years ago

Last modified 10 months ago

#24203 new Cleanup/optimization

Optimisation: adding multiple fields to same model should attempt to run single ALTER TABLE statement

Reported by: Peter Lauri Owned by: nobody
Component: Migrations Version: master
Severity: Normal Keywords: migration
Cc: Markus Holtermann, ambivalentno, dharris+django@…, emorley@… Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Origin: https://groups.google.com/d/msg/django-users/DDekrNgXH2U/k1un-e8CbnMJ

When adding multiple fields to a Model the makemigrate and migrate does not attempt to create one ALTER TABLE statement, but rather one per field added. This can cause slow migrations of large tables.

I propose to optimise to attempt to merge multiple AddField to one single ALTER TABLE request when the migrate is run.

My real world case the db backend was MySQL, here is PG example of 2nd migration db output after adding two fields to one model.

  Applying play.0002_auto_20150122_1825...DEBUG ALTER TABLE "play_play" ADD COLUMN "field2" varchar(100) NULL; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field2" varchar(100) NULL; args=[]
DEBUG ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT; args=[]
DEBUG ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100) NULL; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100) NULL; args=[]
DEBUG ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT; args=[]

Change History (13)

comment:1 Changed 3 years ago by Peter Lauri

Last edited 3 years ago by Peter Lauri (previous) (diff)

comment:2 Changed 3 years ago by Tim Graham

Could you give some performance numbers for comparison?

comment:3 in reply to:  2 Changed 3 years ago by Peter Lauri

Replying to timgraham:

Could you give some performance numbers for comparison?

I can provide this when I'm back from vacation in two weeks (13th of February), I don't have access to the db for the moment.

comment:4 Changed 3 years ago by Markus Holtermann

Cc: Markus Holtermann added
Triage Stage: UnreviewedSomeday/Maybe
Version: 1.7master

On PG it is recommended (in the docs iirc, but can't find a link) to add a column with null=True, promote the row values for existing rows and drop the not null constraint. This way adding the column is instant and thus no need to join multiple AddField operations.

On MySQL <5.6 the ADD COLUMN operations are expensive in any case independent from NULL or NOT NULL afaik.

On SQLite, well, it's SQLite. It's slow anyways and doesn't support adding multiple columns at once.

I have no clue about Oracle's behavior.

That said, I'm not sure this is really necessary but keep the issue open for your feedback.

comment:5 in reply to:  2 Changed 3 years ago by Simon Percivall

Replying to timgraham:

Could you give some performance numbers for comparison?

I can give performance numbers for a MySQL installation.

Doing an ALTER TABLE on a table with 120 columns and containing approx. 20M rows takes 2–3 hours. Combining several ALTER TABLES into one takes no longer time.

The simple math here is that three Django migrations would take 6–9 hours, whereas the combined version would take 2–3 hours.

comment:6 in reply to:  2 Changed 3 years ago by Peter Lauri

Replying to timgraham:

Could you give some performance numbers for comparison?

This is not from production, it contains ~500k rows. I will try it out on PG as well, just need to prep some data.

mysql> select count(*) from thetable;
+----------+
| count(*) |
+----------+
|   489484 |
+----------+
1 row in set (0.07 sec)

mysql> ALTER TABLE thetable ADD COLUMN field2 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 29.05 sec)
Records: 489484  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE thetable ADD COLUMN field3 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 10.82 sec)
Records: 489484  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE thetable ADD COLUMN field4 varchar(100) NULL, ADD COLUMN field5 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 24.92 sec)
Records: 489484  Duplicates: 0  Warnings: 0

mysql>

comment:7 in reply to:  2 Changed 3 years ago by Peter Lauri

Replying to timgraham:

Could you give some performance numbers for comparison?

I prepped a DB with some fields and 500k rows. The alter table statements as in my original description executed each in few milliseconds. No significant improvement to execute them in a single statement.

ballongen=# select count(*) from testtable;
 count
--------
 500000
(1 row)

Time: 34.261 ms
ballongen=# ALTER TABLE "testtable" ADD COLUMN "field2" varchar(100) NULL;
ALTER TABLE
Time: 56.649 ms
ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field2" DROP DEFAULT;
ALTER TABLE
Time: 2.563 ms
ballongen=# ALTER TABLE "testtable" ADD COLUMN "field3" varchar(100) NULL;
ALTER TABLE
Time: 1.165 ms
ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field3" DROP DEFAULT;
ALTER TABLE
Time: 0.240 ms

comment:8 Changed 3 years ago by Adam (Chainz) Johnson

+1

I spend a large amount of time rewriting the SQL output from django migrations to combine the ALTER TABLE statements for each table into one, just so they can pass on our production database (MySQL, 30GB, hundreds of millions of rows).

Actually I normally use pt-online-schema-change for tables of any size, since it's 0 downtime, but it's still a pain to rewrite all the SQL into a single ALTER.

comment:9 Changed 3 years ago by Peter Lauri

Please note that in my comment #7 (https://code.djangoproject.com/ticket/24203#comment:7) I forgot to mention this was for PostgreSQL.

comment:10 Changed 2 years ago by ambivalentno

Cc: ambivalentno added

comment:11 Changed 2 years ago by ambivalentno

Regarding timings:
InnoDB table with 2 mln entries (but lots of big TextFields).
It takes ~3 hours to add a single field, and also ~3 hours to add multiple fields in case they're grouped via single sql statement. Actually, most of time it takes to perform a migration mysql copies data to temporary table.

comment:12 Changed 13 months ago by Doug Harris

Cc: dharris+django@… added

comment:13 Changed 10 months ago by Ed Morley

Cc: emorley@… added
Note: See TracTickets for help on using tickets.
Back to Top