Opened 4 weeks ago
Last modified 4 weeks ago
#37031 assigned Cleanup/optimization
Improve writing migrations guide to adding unique fields on existing table
| Reported by: | Clifford Gama | Owned by: | Clifford Gama |
|---|---|---|---|
| Component: | Documentation | Version: | dev |
| Severity: | Normal | Keywords: | migrations |
| Cc: | MANAS MADESHIYA, Shai Berger, Markus Holtermann | Triage Stage: | Accepted |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
- The current approach splits the work across three files: one to add the field, one to populate values, and one to restore the constraint. All three operations can be placed in a single migration, which is simpler to follow and has the added benefit of being atomic — removing the race condition that is currently warned about in the docs.
- I think the section should mention that
Field.db_defaultavoids this problem entirely by having the database generate a unique value per row. This is worth noting upfront so readers can choose the simpler path where their use case allows.
- No mention of performant alternatives for large tables: The
RunPythonexample iterates row by row with individual saves. For large tables this will be very slow. The docs should note thatQuerySet.bulk_update()or RunSQL are worth considering in that case.
Change History (5)
comment:1 by , 4 weeks ago
| Cc: | added |
|---|
comment:2 by , 4 weeks ago
| Cc: | added |
|---|
comment:3 by , 4 weeks ago
| Description: | modified (diff) |
|---|---|
| Summary: | Improve "Writing migrations" how-to -- unique fields and ManyToManyField through models → Improve writing migrations guide to adding unique fields on existing table |
| Type: | Bug → Cleanup/optimization |
| Version: | 6.0 → dev |
comment:4 by , 4 weeks ago
| Triage Stage: | Unreviewed → Accepted |
|---|
The basis of these instructions were written ~15 years ago and I agree we can make some improvements. Thank you
comment:5 by , 4 weeks ago
All three operations can be placed in a single migration, which is simpler to follow and has the added benefit of being atomic — removing the race condition that is currently warned about in the docs.
Some warnings about this approach
- This will hold a lock on the table for the whole duration of the transaction which is not practical in many cases. That's the reason why we document that data migration should not be run in a transaction.
- This won't change anything on MySQL which doesn't support transactional DDL.
I still believe the approach proposed by Shai in #23932:comment:2 is correct.
If you
- Add the field as
(null=True, unique=True, default=uuid.uuid4)and run the migration - Deploy the changes so any row created from that point sets a value
- Run a migration that performs a backfill in a non-atomic manner
Migration.atomic=False - Run a migration that drops
null=True
There are no race conditions possible.
I would also cast doubt about the db_default approach unless someone can confirm it works on all backends. From my understanding MySQL and Postgres can achieve instant columns addition with DEFAULT when the value is a constant (non-volatile) by storing the value in their table catalog and resolving the absence of value in the table tuples as the default. I can't see how this is possible when the value is not constant per row like it's the case with gen_random_uuid.
Per Postgres docs
Adding a column with a volatile
DEFAULT(e.g., clock_timestamp()), a stored generated column, an identity column, or a column with a domain data type that has constraints will cause the entire table and its indexes to be rewritten. Adding a virtual generated column never requires a rewrite.
A full table rewrite in a transaction will cause the same issues as manually issuing a single UPDATE statement and should be avoided.
The docs on "Migrations that add unique fields" was added as part of #23932, I will cc some folks involved in that ticket in case they want to add their opinion
Clifford please can you split this into 2 tickets? Repurpose this ticket only for the "Migrations that add unique fields" suggested changes and a new ticket for "Changing a ManyToManyField to use a through model" suggested changes