Opened 6 months ago

Last modified 6 months ago

#28273 new Cleanup/optimization

Document how to prevent adding columns with defaults in migrations

Reported by: Raphael Gaschignard Owned by: nobody
Component: Documentation Version: 1.10
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

So normally the Django documentation says that it doesn't use database defaults.

If you have a model where you add a NullBooleanField with a model-side default and generate a migration then you end up with the following SQL being executed (at least for Postgres)

BEGIN;
--
-- Add field some_field to table
--
ALTER TABLE "appname_model" ADD COLUMN "some_field" boolean DEFAULT true NULL;
ALTER TABLE "appname_model" ALTER COLUMN "some_field" DROP DEFAULT;
COMMIT;

This actually backfills the value for you in the DB during the ADD COLUMN command

But from the PostgreSQL documentation:

When a column is added with ADD COLUMN, all existing rows in the table are initialized with the column's default value (NULL if no DEFAULT clause is specified).

Adding a column with a non-null default or changing the type of an existing column will require the entire table and indexes to be rewritten.

So if you're adding a column to a big enough table, the migration run itself will be very painful, because it will update _every_ row of your table to set the default.

Usually you want to set the default value model-side to help with the future backfill. So the way to handle this is to usually remove the automatically inserted default from the migration. But it can be easy to miss this (or to have custom fields that auto-set defaults).

But the generated SQL is something you basically never want to run on a big table. I would love to have a way to just prevent a migration from ever generating an ADD COLUMN with a DEFAULT clause.

I think there would be a way to handle this through simple changes in the schema editor but I couldn't figure out how the default code was generated.

Change History (6)

comment:1 Changed 6 months ago by Tim Graham

Can't you accomplish this by adding the column as null=True, populating the values, then changing the column to null=False (similar to the Migrations that add unique fields howto)? I believe defaults should only be added when adding non-nullable columns -- if that behavior were changed, how would the non-null constraint be avoided?

comment:2 in reply to:  1 Changed 6 months ago by Raphael Gaschignard

Replying to Tim Graham:

Can't you accomplish this by adding the column as null=True, populating the values, then changing the column to null=False (similar to the Migrations that add unique fields howto)? I believe defaults should only be added when adding non-nullable columns -- if that behavior were changed, how would the non-null constraint be avoided?

Because we want to deploy without downtime, we already deploy fields with null=True, and add a backfill later. The issue I'm trying to point out here is that with default=a_value set (even with null=True) we still hit an issue because Postgres backfills defaults into existing rows when a column is added (even if the column is nullable). While normally Django doesn't use DB-level defaults, for the AddField operation it does add a default in the ADD COLUMN command if a default exists in the migration operation (and then remove it immediately, as seen in the SQL I posted earlier). I am not aware of how other DB backends handle adding a column with DEFAULT + NULL set.

Our conclusion is that any AddField with a default set cannot be accomplished without downtime in a production system (similar to a non-null field), so we've written the following check for our migrations:

from django.db.migrations.operations.fields import AddField
from django.db.models.fields import NOT_PROVIDED
from django.db.models.signals import pre_migrate

def check_migration_safety(**kwargs):
    # `plan` is not part of the public API and is subject to change in later
    # versions of Django.
    plan = kwargs['plan']

    for migration_cls, rolled_back in plan:
        # Check if this is going forwards or rolling back a migration.
        if not rolled_back:
            for operation in migration_cls.operations:
                if isinstance(operation, AddField):
                    description = "{migration!r} model '{model}' field '{field}': ".format(
                        migration=migration_cls,
                        model=operation.model_name,
                        field=operation.name,
                    )

                    if not operation.field.null:
                        raise ValueError(description + "do not add new non-nullable columns")

                    if operation.field.default != NOT_PROVIDED:
                        raise ValueError(description + "do not set a default when adding columns")


pre_migrate.connect(
    check_migration_safety,
    dispatch_uid="operations.check_migration_safety",
)
Last edited 6 months ago by Raphael Gaschignard (previous) (diff)

comment:3 Changed 6 months ago by Tim Graham

Component: Database layer (models, ORM)Migrations
Type: UncategorizedCleanup/optimization

If a model field default is added after adding a nullable field, does it have the same issue? For example:

migration one:
adding foo = models.NullBooleanField()
ALTER TABLE "t28273_test" ADD COLUMN "foo" boolean NULL;

migration two:
changing to foo = models.NullBooleanField(default=True)
ALTER TABLE "t28273_test" ALTER COLUMN "foo" SET DEFAULT true; # does this cause the backfill problem?
ALTER TABLE "t28273_test" ALTER COLUMN "foo" DROP DEFAULT;

I'm trying to think of how to solve this by structuring migrations appropriately because I'm not seeing another elegant way forward.

comment:4 Changed 6 months ago by Raphael Gaschignard

Adding a default after the column was added does not seem to trigger a backfill. Like you're saying, creating two migrations doesn't cause this issue to appear (except when custom fields add a default, for example)

testing=# select * from t;
 a  | b
----+---
 hi | 3
 hi | 4
 hi | 5
(3 rows)
                                  
testing=# alter table t add column c integer null;
ALTER TABLE
testing=# select * from t;
 a  | b | c
----+---+---
 hi | 3 |
 hi | 4 |
 hi | 5 |
(3 rows)

testing=# alter table t alter COLUMN c set default 3;
ALTER TABLE
testing=# select * from t;
 a  | b | c
----+---+---
 hi | 3 |
 hi | 4 |
 hi | 5 |
(3 rows)

testing=# alter table t add column d integer null default 10;
ALTER TABLE
testing=# select * from t;
 a  | b | c | d
----+---+---+----
 hi | 3 |   | 10
 hi | 4 |   | 10
 hi | 5 |   | 10
(3 rows)

comment:5 Changed 6 months ago by Tim Graham

Would documenting this solution be sufficient?

comment:6 Changed 6 months ago by Tim Graham

Component: MigrationsDocumentation
Summary: Have a way to prevent adding columns with defaults in migrationsDocument how to prevent adding columns with defaults in migrations
Triage Stage: UnreviewedAccepted
Note: See TracTickets for help on using tickets.
Back to Top