Opened 7 years ago

Closed 3 months ago

Last modified 3 months ago

#23610 closed Cleanup/optimization (wontfix)

Removing a null constraint can lead to race conditions with migrations

Reported by: Josh Smeaton Owned by: Jacob Walls
Component: Documentation Version: 1.7
Severity: Normal Keywords: migrations
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I think a race condition exists when removing a null constraint that doesn't contain a default. This discussion started with regards to #23609. The flow here was changing a column from having a null to having a default:

IntegerField(null=True) -> IntegerField(default=42)

And the race condition:

  • Deploy migration
  • Update App Server 1 code
  • App Server 2 tries to write to the table, and gets a write error (since it doesn't yet have the default in its models file)
  • Update App Server 2 code

For a write heavy table, this could result in many failed writes.

I think we should be clear, and document, that removing a null on a column should always be done in two steps:

  1. Add a default to the column. Deploy migration. Deploy app code.
  2. Remove the null and the default from the column. Deploy migration. Deploy app code.

Perhaps there could be a check that ensures that removing a null always requires an existing default to exist.

Change History (5)

comment:1 Changed 7 years ago by Loic Bistuer

Component: MigrationsDocumentation
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

There are many cases where old app code won't work after a migration applied (e.g. adding a new column with NOT NULL, removing a column, etc.). I reckon we should just document that people are expected to run the new app code right after applying a migration.

There are certainly ways around it, but it's important people understand that nothing special is done at the framework level.

comment:2 Changed 4 months ago by Jacob Walls

Has patch: set

comment:3 Changed 4 months ago by Jacob Walls

Owner: changed from nobody to Jacob Walls
Status: newassigned

comment:4 Changed 3 months ago by Mariusz Felisiak

Resolution: wontfix
Status: assignedclosed

A short note is not enough to explain it without creating confusion. On the other hand I don't think we want to create an extra section for this.

comment:5 Changed 3 months ago by Simon Charette

FWIW I've been working on a third-party application that resolves this problem by introducing a concept of pre- and post-deployment migrations. We've been using it in production for a few months now.

It has a built-in notion of quorum which allows the following sequence of operation to be taken across multiple instances (e.g. multiple k8s clusters)

  1. Run the migrate --pre-deploy command to alter the column to have it NULL SET DEFAULT 42
  2. Wait for the N clusters to have completed their application rollout
  3. Run the migrate command to have it NOT NULL and DROP DEFAULT

I created an issue to add support for this particular case as null constraint removal is not explicitly tested but it should only be matter of adjusting the auto-detector to have a IntegerField(null=True) -> IntegerField(default=42) change generate an extra PreAlterField operation meant to run on migrate --pre-deploy.

Last edited 3 months ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top