Opened 10 years ago

Last modified 10 days ago

#24203 new Cleanup/optimization

Optimisation: adding multiple fields to same model should attempt to run single ALTER TABLE statement

Reported by: Peter Lauri Owned by: nobody
Component: Migrations Version: dev
Severity: Normal Keywords: migration
Cc: Markus Holtermann, ambivalentno, dharris+django@…, emorley@…, Phil Krylov, Adam Johnson, elonzh Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Origin: https://groups.google.com/d/msg/django-users/DDekrNgXH2U/k1un-e8CbnMJ

When adding multiple fields to a Model the makemigrate and migrate does not attempt to create one ALTER TABLE statement, but rather one per field added. This can cause slow migrations of large tables.

I propose to optimise to attempt to merge multiple AddField to one single ALTER TABLE request when the migrate is run.

My real world case the db backend was MySQL, here is PG example of 2nd migration db output after adding two fields to one model.

  Applying play.0002_auto_20150122_1825...DEBUG ALTER TABLE "play_play" ADD COLUMN "field2" varchar(100) NULL; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field2" varchar(100) NULL; args=[]
DEBUG ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT; args=[]
DEBUG ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100) NULL; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100) NULL; args=[]
DEBUG ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT; (params [])
DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT; args=[]

Change History (19)

comment:1 by Peter Lauri, 10 years ago

Last edited 10 years ago by Peter Lauri (previous) (diff)

comment:2 by Tim Graham, 10 years ago

Could you give some performance numbers for comparison?

in reply to:  2 comment:3 by Peter Lauri, 10 years ago

Replying to timgraham:

Could you give some performance numbers for comparison?

I can provide this when I'm back from vacation in two weeks (13th of February), I don't have access to the db for the moment.

comment:4 by Markus Holtermann, 10 years ago

Cc: Markus Holtermann added
Triage Stage: UnreviewedSomeday/Maybe
Version: 1.7master

On PG it is recommended (in the docs iirc, but can't find a link) to add a column with null=True, promote the row values for existing rows and drop the not null constraint. This way adding the column is instant and thus no need to join multiple AddField operations.

On MySQL <5.6 the ADD COLUMN operations are expensive in any case independent from NULL or NOT NULL afaik.

On SQLite, well, it's SQLite. It's slow anyways and doesn't support adding multiple columns at once.

I have no clue about Oracle's behavior.

That said, I'm not sure this is really necessary but keep the issue open for your feedback.

in reply to:  2 comment:5 by Simon Percivall, 10 years ago

Replying to timgraham:

Could you give some performance numbers for comparison?

I can give performance numbers for a MySQL installation.

Doing an ALTER TABLE on a table with 120 columns and containing approx. 20M rows takes 2–3 hours. Combining several ALTER TABLES into one takes no longer time.

The simple math here is that three Django migrations would take 6–9 hours, whereas the combined version would take 2–3 hours.

in reply to:  2 comment:6 by Peter Lauri, 10 years ago

Replying to timgraham:

Could you give some performance numbers for comparison?

This is not from production, it contains ~500k rows. I will try it out on PG as well, just need to prep some data.

mysql> select count(*) from thetable;
+----------+
| count(*) |
+----------+
|   489484 |
+----------+
1 row in set (0.07 sec)

mysql> ALTER TABLE thetable ADD COLUMN field2 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 29.05 sec)
Records: 489484  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE thetable ADD COLUMN field3 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 10.82 sec)
Records: 489484  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE thetable ADD COLUMN field4 varchar(100) NULL, ADD COLUMN field5 varchar(100) NULL;
Query OK, 489484 rows affected (2 min 24.92 sec)
Records: 489484  Duplicates: 0  Warnings: 0

mysql>

in reply to:  2 comment:7 by Peter Lauri, 10 years ago

Replying to timgraham:

Could you give some performance numbers for comparison?

I prepped a DB with some fields and 500k rows. The alter table statements as in my original description executed each in few milliseconds. No significant improvement to execute them in a single statement.

ballongen=# select count(*) from testtable;
 count
--------
 500000
(1 row)

Time: 34.261 ms
ballongen=# ALTER TABLE "testtable" ADD COLUMN "field2" varchar(100) NULL;
ALTER TABLE
Time: 56.649 ms
ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field2" DROP DEFAULT;
ALTER TABLE
Time: 2.563 ms
ballongen=# ALTER TABLE "testtable" ADD COLUMN "field3" varchar(100) NULL;
ALTER TABLE
Time: 1.165 ms
ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field3" DROP DEFAULT;
ALTER TABLE
Time: 0.240 ms

comment:8 by Adam Johnson, 10 years ago

+1

I spend a large amount of time rewriting the SQL output from django migrations to combine the ALTER TABLE statements for each table into one, just so they can pass on our production database (MySQL, 30GB, hundreds of millions of rows).

Actually I normally use pt-online-schema-change for tables of any size, since it's 0 downtime, but it's still a pain to rewrite all the SQL into a single ALTER.

comment:9 by Peter Lauri, 10 years ago

Please note that in my comment #7 (https://code.djangoproject.com/ticket/24203#comment:7) I forgot to mention this was for PostgreSQL.

comment:10 by ambivalentno, 10 years ago

Cc: ambivalentno added

comment:11 by ambivalentno, 10 years ago

Regarding timings:
InnoDB table with 2 mln entries (but lots of big TextFields).
It takes ~3 hours to add a single field, and also ~3 hours to add multiple fields in case they're grouped via single sql statement. Actually, most of time it takes to perform a migration mysql copies data to temporary table.

comment:12 by Doug Harris, 8 years ago

Cc: dharris+django@… added

comment:13 by Ed Morley, 8 years ago

Cc: emorley@… added

comment:14 by Phil Krylov, 5 years ago

Cc: Phil Krylov added

comment:15 by Adam Johnson, 5 years ago

Cc: Adam Johnson added

comment:16 by elonzh, 2 years ago

Cc: elonzh added

comment:17 by elonzh, 2 years ago

Cc: elonzh removed

comment:18 by elonzh, 23 months ago

Cc: elonzh added

comment:19 by Pratyush Mittal, 10 days ago

Yeah, clubbing will save multiple copy to tmp table steps on large tables.

I have a MySQL table with ~20mln rows. I increased the decimal places for 6 columns. This took 45 minutes which would have taken 5 minutes if the alter statements were clubbed.

Note: See TracTickets for help on using tickets.
Back to Top