Opened 10 years ago
Last modified 10 days ago
#24203 new Cleanup/optimization
Optimisation: adding multiple fields to same model should attempt to run single ALTER TABLE statement
Reported by: | Peter Lauri | Owned by: | nobody |
---|---|---|---|
Component: | Migrations | Version: | dev |
Severity: | Normal | Keywords: | migration |
Cc: | Markus Holtermann, ambivalentno, dharris+django@…, emorley@…, Phil Krylov, Adam Johnson, elonzh | Triage Stage: | Someday/Maybe |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Origin: https://groups.google.com/d/msg/django-users/DDekrNgXH2U/k1un-e8CbnMJ
When adding multiple fields to a Model the makemigrate and migrate does not attempt to create one ALTER TABLE statement, but rather one per field added. This can cause slow migrations of large tables.
I propose to optimise to attempt to merge multiple AddField to one single ALTER TABLE request when the migrate is run.
My real world case the db backend was MySQL, here is PG example of 2nd migration db output after adding two fields to one model.
Applying play.0002_auto_20150122_1825...DEBUG ALTER TABLE "play_play" ADD COLUMN "field2" varchar(100) NULL; (params []) DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field2" varchar(100) NULL; args=[] DEBUG ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT; (params []) DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field2" DROP DEFAULT; args=[] DEBUG ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100) NULL; (params []) DEBUG (0.000) ALTER TABLE "play_play" ADD COLUMN "field3" varchar(100) NULL; args=[] DEBUG ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT; (params []) DEBUG (0.000) ALTER TABLE "play_play" ALTER COLUMN "field3" DROP DEFAULT; args=[]
Change History (19)
follow-ups: 3 5 6 7 comment:2 by , 10 years ago
Could you give some performance numbers for comparison?
comment:3 by , 10 years ago
Replying to timgraham:
Could you give some performance numbers for comparison?
I can provide this when I'm back from vacation in two weeks (13th of February), I don't have access to the db for the moment.
comment:4 by , 10 years ago
Cc: | added |
---|---|
Triage Stage: | Unreviewed → Someday/Maybe |
Version: | 1.7 → master |
On PG it is recommended (in the docs iirc, but can't find a link) to add a column with null=True
, promote the row values for existing rows and drop the not null
constraint. This way adding the column is instant and thus no need to join multiple AddField
operations.
On MySQL <5.6 the ADD COLUMN
operations are expensive in any case independent from NULL
or NOT NULL
afaik.
On SQLite, well, it's SQLite. It's slow anyways and doesn't support adding multiple columns at once.
I have no clue about Oracle's behavior.
That said, I'm not sure this is really necessary but keep the issue open for your feedback.
comment:5 by , 10 years ago
Replying to timgraham:
Could you give some performance numbers for comparison?
I can give performance numbers for a MySQL installation.
Doing an ALTER TABLE on a table with 120 columns and containing approx. 20M rows takes 2–3 hours. Combining several ALTER TABLES into one takes no longer time.
The simple math here is that three Django migrations would take 6–9 hours, whereas the combined version would take 2–3 hours.
comment:6 by , 10 years ago
Replying to timgraham:
Could you give some performance numbers for comparison?
This is not from production, it contains ~500k rows. I will try it out on PG as well, just need to prep some data.
mysql> select count(*) from thetable; +----------+ | count(*) | +----------+ | 489484 | +----------+ 1 row in set (0.07 sec) mysql> ALTER TABLE thetable ADD COLUMN field2 varchar(100) NULL; Query OK, 489484 rows affected (2 min 29.05 sec) Records: 489484 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE thetable ADD COLUMN field3 varchar(100) NULL; Query OK, 489484 rows affected (2 min 10.82 sec) Records: 489484 Duplicates: 0 Warnings: 0 mysql> ALTER TABLE thetable ADD COLUMN field4 varchar(100) NULL, ADD COLUMN field5 varchar(100) NULL; Query OK, 489484 rows affected (2 min 24.92 sec) Records: 489484 Duplicates: 0 Warnings: 0 mysql>
comment:7 by , 10 years ago
Replying to timgraham:
Could you give some performance numbers for comparison?
I prepped a DB with some fields and 500k rows. The alter table statements as in my original description executed each in few milliseconds. No significant improvement to execute them in a single statement.
ballongen=# select count(*) from testtable; count -------- 500000 (1 row) Time: 34.261 ms ballongen=# ALTER TABLE "testtable" ADD COLUMN "field2" varchar(100) NULL; ALTER TABLE Time: 56.649 ms ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field2" DROP DEFAULT; ALTER TABLE Time: 2.563 ms ballongen=# ALTER TABLE "testtable" ADD COLUMN "field3" varchar(100) NULL; ALTER TABLE Time: 1.165 ms ballongen=# ALTER TABLE "testtable" ALTER COLUMN "field3" DROP DEFAULT; ALTER TABLE Time: 0.240 ms
comment:8 by , 10 years ago
+1
I spend a large amount of time rewriting the SQL output from django migrations to combine the ALTER TABLE statements for each table into one, just so they can pass on our production database (MySQL, 30GB, hundreds of millions of rows).
Actually I normally use pt-online-schema-change for tables of any size, since it's 0 downtime, but it's still a pain to rewrite all the SQL into a single ALTER.
comment:9 by , 10 years ago
Please note that in my comment #7 (https://code.djangoproject.com/ticket/24203#comment:7) I forgot to mention this was for PostgreSQL.
comment:10 by , 10 years ago
Cc: | added |
---|
comment:11 by , 10 years ago
Regarding timings:
InnoDB table with 2 mln entries (but lots of big TextFields).
It takes ~3 hours to add a single field, and also ~3 hours to add multiple fields in case they're grouped via single sql statement. Actually, most of time it takes to perform a migration mysql copies data to temporary table.
comment:12 by , 8 years ago
Cc: | added |
---|
comment:13 by , 8 years ago
Cc: | added |
---|
comment:14 by , 5 years ago
Cc: | added |
---|
comment:15 by , 5 years ago
Cc: | added |
---|
comment:16 by , 2 years ago
Cc: | added |
---|
comment:17 by , 2 years ago
Cc: | removed |
---|
comment:18 by , 23 months ago
Cc: | added |
---|
comment:19 by , 10 days ago
Yeah, clubbing will save multiple copy to tmp table
steps on large tables.
I have a MySQL table with ~20mln rows. I increased the decimal places for 6 columns. This took 45 minutes which would have taken 5 minutes if the alter statements were clubbed.
Some ideas here: https://github.com/peterlauri/django/tree/ticket_24203