Opened 7 years ago

Closed 4 years ago

Last modified 4 years ago

#28273 closed Cleanup/optimization (fixed)

Document how to prevent adding columns with defaults in migrations

Reported by: Raphael Gaschignard Owned by: Caio Ariede
Component: Documentation Version: 1.10
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

So normally the Django documentation says that it doesn't use database defaults.

If you have a model where you add a NullBooleanField with a model-side default and generate a migration then you end up with the following SQL being executed (at least for Postgres)

BEGIN;
--
-- Add field some_field to table
--
ALTER TABLE "appname_model" ADD COLUMN "some_field" boolean DEFAULT true NULL;
ALTER TABLE "appname_model" ALTER COLUMN "some_field" DROP DEFAULT;
COMMIT;

This actually backfills the value for you in the DB during the ADD COLUMN command

But from the PostgreSQL documentation:

When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column's default value (NULL if no DEFAULT clause is specified).

Adding a column with a non-null default or changing the type of an existing column will require the entire table and indexes to be rewritten.

So if you're adding a column to a big enough table, the migration run itself will be very painful, because it will update _every_ row of your table to set the default.

Usually you want to set the default value model-side to help with the future backfill. So the way to handle this is to usually remove the automatically inserted default from the migration. But it can be easy to miss this (or to have custom fields that auto-set defaults).

But the generated SQL is something you basically never want to run on a big table. I would love to have a way to just prevent a migration from ever generating an ADD COLUMN with a DEFAULT clause.

I think there would be a way to handle this through simple changes in the schema editor but I couldn't figure out how the default code was generated.

Change History (11)

comment:1 by Tim Graham, 7 years ago

Can't you accomplish this by adding the column as null=True, populating the values, then changing the column to null=False (similar to the Migrations that add unique fields howto)? I believe defaults should only be added when adding non-nullable columns -- if that behavior were changed, how would the non-null constraint be avoided?

in reply to:  1 comment:2 by Raphael Gaschignard, 7 years ago

Replying to Tim Graham:

Can't you accomplish this by adding the column as null=True, populating the values, then changing the column to null=False (similar to the Migrations that add unique fields howto)? I believe defaults should only be added when adding non-nullable columns -- if that behavior were changed, how would the non-null constraint be avoided?

Because we want to deploy without downtime, we already deploy fields with null=True, and add a backfill later. The issue I'm trying to point out here is that with default=a_value set (even with null=True) we still hit an issue because Postgres backfills defaults into existing rows when a column is added (even if the column is nullable). While normally Django doesn't use DB-level defaults, for the AddField operation it does add a default in the ADD COLUMN command if a default exists in the migration operation (and then remove it immediately, as seen in the SQL I posted earlier). I am not aware of how other DB backends handle adding a column with DEFAULT + NULL set.

Our conclusion is that any AddField with a default set cannot be accomplished without downtime in a production system (similar to a non-null field), so we've written the following check for our migrations:

from django.db.migrations.operations.fields import AddField
from django.db.models.fields import NOT_PROVIDED
from django.db.models.signals import pre_migrate

def check_migration_safety(**kwargs):
    # `plan` is not part of the public API and is subject to change in later
    # versions of Django.
    plan = kwargs['plan']

    for migration_cls, rolled_back in plan:
        # Check if this is going forwards or rolling back a migration.
        if not rolled_back:
            for operation in migration_cls.operations:
                if isinstance(operation, AddField):
                    description = "{migration!r} model '{model}' field '{field}': ".format(
                        migration=migration_cls,
                        model=operation.model_name,
                        field=operation.name,
                    )

                    if not operation.field.null:
                        raise ValueError(description + "do not add new non-nullable columns")

                    if operation.field.default != NOT_PROVIDED:
                        raise ValueError(description + "do not set a default when adding columns")


pre_migrate.connect(
    check_migration_safety,
    dispatch_uid="operations.check_migration_safety",
)
Last edited 7 years ago by Raphael Gaschignard (previous) (diff)

comment:3 by Tim Graham, 7 years ago

Component: Database layer (models, ORM)Migrations
Type: UncategorizedCleanup/optimization

If a model field default is added after adding a nullable field, does it have the same issue? For example:

migration one:
adding foo = models.NullBooleanField()
ALTER TABLE "t28273_test" ADD COLUMN "foo" boolean NULL;

migration two:
changing to foo = models.NullBooleanField(default=True)
ALTER TABLE "t28273_test" ALTER COLUMN "foo" SET DEFAULT true; # does this cause the backfill problem?
ALTER TABLE "t28273_test" ALTER COLUMN "foo" DROP DEFAULT;

I'm trying to think of how to solve this by structuring migrations appropriately because I'm not seeing another elegant way forward.

comment:4 by Raphael Gaschignard, 7 years ago

Adding a default after the column was added does not seem to trigger a backfill. Like you're saying, creating two migrations doesn't cause this issue to appear (except when custom fields add a default, for example)

testing=# select * from t;
 a  | b
----+---
 hi | 3
 hi | 4
 hi | 5
(3 rows)
                                  
testing=# alter table t add column c integer null;
ALTER TABLE
testing=# select * from t;
 a  | b | c
----+---+---
 hi | 3 |
 hi | 4 |
 hi | 5 |
(3 rows)

testing=# alter table t alter COLUMN c set default 3;
ALTER TABLE
testing=# select * from t;
 a  | b | c
----+---+---
 hi | 3 |
 hi | 4 |
 hi | 5 |
(3 rows)

testing=# alter table t add column d integer null default 10;
ALTER TABLE
testing=# select * from t;
 a  | b | c | d
----+---+---+----
 hi | 3 |   | 10
 hi | 4 |   | 10
 hi | 5 |   | 10
(3 rows)

comment:5 by Tim Graham, 7 years ago

Would documenting this solution be sufficient?

comment:6 by Tim Graham, 7 years ago

Component: MigrationsDocumentation
Summary: Have a way to prevent adding columns with defaults in migrationsDocument how to prevent adding columns with defaults in migrations
Triage Stage: UnreviewedAccepted

in reply to:  5 comment:7 by Caio Ariede, 4 years ago

Owner: changed from nobody to Caio Ariede
Status: newassigned

Replying to Tim Graham:

Would documenting this solution be sufficient?

Where do you think the documentation for this could go?

comment:9 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 46a05e10:

[2.2.x] Fixed #28273 -- Doc'd fast nullable column creation with defaults.

Backport of 06909fe084f87a65459a83bd69d7cdbe4fce9a7c from master

comment:10 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 282138a:

[3.0.x] Fixed #28273 -- Doc'd fast nullable column creation with defaults.

Backport of 06909fe084f87a65459a83bd69d7cdbe4fce9a7c from master

comment:11 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 06909fe0:

Fixed #28273 -- Doc'd fast nullable column creation with defaults.

Note: See TracTickets for help on using tickets.
Back to Top