Opened 4 weeks ago

Last modified 4 weeks ago

#37031 assigned Cleanup/optimization

Improve writing migrations guide to adding unique fields on existing table

Reported by: Clifford Gama Owned by: Clifford Gama
Component: Documentation Version: dev
Severity: Normal Keywords: migrations
Cc: MANAS MADESHIYA, Shai Berger, Markus Holtermann Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Clifford Gama)

  1. The current approach splits the work across three files: one to add the field, one to populate values, and one to restore the constraint. All three operations can be placed in a single migration, which is simpler to follow and has the added benefit of being atomic — removing the race condition that is currently warned about in the docs.
  1. I think the section should mention that Field.db_default avoids this problem entirely by having the database generate a unique value per row. This is worth noting upfront so readers can choose the simpler path where their use case allows.
  1. No mention of performant alternatives for large tables: The RunPython example iterates row by row with individual saves. For large tables this will be very slow. The docs should note that QuerySet.bulk_update() or RunSQL are worth considering in that case.

Change History (5)

comment:1 by MANAS MADESHIYA, 4 weeks ago

Cc: MANAS MADESHIYA added

comment:2 by Sarah Boyce, 4 weeks ago

Cc: Shai Berger Markus Holtermann added

The docs on "Migrations that add unique fields" was added as part of #23932, I will cc some folks involved in that ticket in case they want to add their opinion

Clifford please can you split this into 2 tickets? Repurpose this ticket only for the "Migrations that add unique fields" suggested changes and a new ticket for "Changing a ManyToManyField to use a through model" suggested changes

comment:3 by Clifford Gama, 4 weeks ago

Description: modified (diff)
Summary: Improve "Writing migrations" how-to -- unique fields and ManyToManyField through modelsImprove writing migrations guide to adding unique fields on existing table
Type: BugCleanup/optimization
Version: 6.0dev

comment:4 by Sarah Boyce, 4 weeks ago

Triage Stage: UnreviewedAccepted

The basis of these instructions were written ~15 years ago and I agree we can make some improvements. Thank you

comment:5 by Simon Charette, 4 weeks ago

All three operations can be placed in a single migration, which is simpler to follow and has the added benefit of being atomic — removing the race condition that is currently warned about in the docs.

Some warnings about this approach

  1. This will hold a lock on the table for the whole duration of the transaction which is not practical in many cases. That's the reason why we document that data migration should not be run in a transaction.
  2. This won't change anything on MySQL which doesn't support transactional DDL.

I still believe the approach proposed by Shai in #23932:comment:2 is correct.

If you

  1. Add the field as (null=True, unique=True, default=uuid.uuid4) and run the migration
  2. Deploy the changes so any row created from that point sets a value
  3. Run a migration that performs a backfill in a non-atomic manner Migration.atomic=False
  4. Run a migration that drops null=True

There are no race conditions possible.

I would also cast doubt about the db_default approach unless someone can confirm it works on all backends. From my understanding MySQL and Postgres can achieve instant columns addition with DEFAULT when the value is a constant (non-volatile) by storing the value in their table catalog and resolving the absence of value in the table tuples as the default. I can't see how this is possible when the value is not constant per row like it's the case with gen_random_uuid.

Per Postgres docs

Adding a column with a volatile DEFAULT (e.g., clock_timestamp()), a stored generated column, an identity column, or a column with a domain data type that has constraints will cause the entire table and its indexes to be rewritten. Adding a virtual generated column never requires a rewrite.

A full table rewrite in a transaction will cause the same issues as manually issuing a single UPDATE statement and should be avoided.

Last edited 4 weeks ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top