#23610 closed Cleanup/optimization (wontfix)
Removing a null constraint can lead to race conditions with migrations
| Reported by: | Josh Smeaton | Owned by: | Jacob Walls |
|---|---|---|---|
| Component: | Documentation | Version: | 1.7 |
| Severity: | Normal | Keywords: | migrations |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
I think a race condition exists when removing a null constraint that doesn't contain a default. This discussion started with regards to #23609. The flow here was changing a column from having a null to having a default:
IntegerField(null=True) -> IntegerField(default=42)
And the race condition:
- Deploy migration
- Update App Server 1 code
- App Server 2 tries to write to the table, and gets a write error (since it doesn't yet have the default in its models file)
- Update App Server 2 code
For a write heavy table, this could result in many failed writes.
I think we should be clear, and document, that removing a null on a column should always be done in two steps:
- Add a default to the column. Deploy migration. Deploy app code.
- Remove the null and the default from the column. Deploy migration. Deploy app code.
Perhaps there could be a check that ensures that removing a null always requires an existing default to exist.
Change History (5)
comment:1 by , 11 years ago
| Component: | Migrations → Documentation |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
| Type: | Uncategorized → Cleanup/optimization |
comment:3 by , 4 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:4 by , 4 years ago
| Resolution: | → wontfix |
|---|---|
| Status: | assigned → closed |
A short note is not enough to explain it without creating confusion. On the other hand I don't think we want to create an extra section for this.
comment:5 by , 4 years ago
FWIW I've been working on a third-party application that resolves this problem by introducing a concept of pre- and post-deployment migrations. We've been using it in production for a few months now.
It has a built-in notion of quorum which allows the following sequence of operation to be taken across multiple instances (e.g. multiple k8s clusters)
- Run the
migrate --pre-deploycommand to alter the column to have itNULL SET DEFAULT 42 - Wait for the N clusters to have completed their application rollout
- Run the
migratecommand to have itNOT NULLandDROP DEFAULT
I created an issue to add support for this particular case as null constraint removal is not explicitly tested but it should only be matter of adjusting the auto-detector to have a IntegerField(null=True) -> IntegerField(default=42) change generate an extra PreAlterField operation meant to run on migrate --pre-deploy.
There are many cases where old app code won't work after a migration applied (e.g. adding a new column with
NOT NULL, removing a column, etc.). I reckon we should just document that people are expected to run the new app code right after applying a migration.There are certainly ways around it, but it's important people understand that nothing special is done at the framework level.