Opened 9 months ago

Last modified 4 weeks ago

#36168 assigned Bug

Backwards migration to replaced migration when other app has squashed migrations can lead to FieldDoesNotExist error due to incorrect state

Reported by: Klaas van Schelven Owned by: houston0222
Component: Migrations Version: 5.1
Severity: Normal Keywords:
Cc: Jacob Walls Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I managed to debug my way to the broken part of the code, but did not manage to find a nice clean reproduction. Hence the following report starts with broken code and only then explains the problem, which is the unusual order.

here's the broken code

This is wrong because it mutates the state on the MigrationExecutor, i.e. in certain conditions a different graph (without the pruning of replace_migrations) is set "globally" on the MigrationExecutor.

But migration_plan (the function of which the broken code is part) is called _twice_ in at least some flows. In particular: first as part of migrate, and then as part of _create_project_state. This means that, for flows where the broken code is called in the first call to migration_plan (for presumably good reasons, as per the comment above it), the construction of the second plan uses a graph that is too large, causing a failure in some cases.

This fails in the following combination of conditions:

  1. a project with squashed migrations in more than one app
  2. where the squashed migrations do not have follow-up migrations (i.e. when the graph is not simplified for replacements, both the squashing migration and the last migration it replaces show up as leaf nodes)
  3. explicit migration to a replaced migration from the command line.

because of the explicit migration to a replaced migration (3) the graph is updated in the "wrong code" as per the comment. Then, when trying to _create_project_state the graph contains leaf nodes for both the original and squashed paths, which means that some things in the project-state-creation happen doubly, which fails.

I have encountered this while working on Bugsink, a self-hosted Error Tracker, on this commit. I have not been able to distill a more clean PoC that actually exhibits failure though.

Replacing the mutation with something that leaves the loader untouched (ugly version below) fixes the problem though.

                    old_loader = self.loader
                    self.loader = MigrationLoader(self.connection)
                    self.loader.replace_migrations = False
                    self.loader.build_graph()
                    result = self.migration_plan(targets, clean_start=clean_start)
                    self.loader = old_loader
                    return result

this ticket may or may not be related

Change History (15)

comment:1 by Sarah Boyce, 9 months ago

Resolution: needsinfo
Status: newclosed

Hi Klaas, thank you for your efforts into debugging further into the migrations
I will close this until there is either a test to demonstrate undesired behavior or a project to replicate (showing an issue different to #36166)
I would avoid going too low-level when creating a ticket as we can go into this depth in PRs/ticket comments when resolving issues

comment:2 by Klaas van Schelven, 9 months ago

ok, finally found a clean reproducer!

https://github.com/vanschelven/squashwithrename

(freshdjango) klaas@pop-os:~/dev/squashwithrename$ rm db.sqlite3 
(freshdjango) klaas@pop-os:~/dev/squashwithrename$ python manage.py migrate 
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, sessions, squashme, triggerfailingcode
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying sessions.0001_initial... OK
  Applying squashme.0001_initial... OK
  Applying squashme.0002_rename_name_foo_rename_squashed_0003_foo_another_field... OK
  Applying triggerfailingcode.0001_squashed_0002_baz_baz... OK
(freshdjango) klaas@pop-os:~/dev/squashwithrename$ python manage.py migrate  triggerfailingcode 0001_initial
Operations to perform:
  Target specific migration: 0001_initial, from triggerfailingcode
Traceback (most recent call last):
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/state.py", line 297, in rename_field
    found = fields.pop(old_name)
KeyError: 'name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/datacrypt/dev/squashwithrename/manage.py", line 22, in <module>
    main()
  File "/mnt/datacrypt/dev/squashwithrename/manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 107, in wrapper
    res = handle_func(*args, **kwargs)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 302, in handle
    pre_migrate_state = executor._create_project_state(with_applied_migrations=True)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/executor.py", line 91, in _create_project_state
    migration.mutate_state(state, preserve=False)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/migration.py", line 91, in mutate_state
    operation.state_forwards(self.app_label, new_state)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/operations/fields.py", line 303, in state_forwards
    state.rename_field(
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/state.py", line 299, in rename_field
    raise FieldDoesNotExist(
django.core.exceptions.FieldDoesNotExist: squashme.foo has no field named 'name'

comment:3 by Klaas van Schelven, 9 months ago

The reason this was somewhat hard to reproduce cleanly is the following:

Per my original report:

the graph contains leaf nodes for both the original and squashed paths, which means that some things in the project-state-creation happen doubly, which fails.

however, there's also this code, which is the adding of a model to a project's state. That code does not bother to check whether a given model already exist, which means that the things happening doubly doesn't trigger an error if both paths start with creating a model, for the simple reason that the model is simply overwritten.

That might actually be a separate bug...

comment:4 by Sarah Boyce, 9 months ago

I've downloaded the project and ran the commands. It migrates without error for me on both Django 5.1 and main 🤔

Apologies, missed the second command to run, reproduced

Last edited 9 months ago by Sarah Boyce (previous) (diff)

comment:5 by Sarah Boyce, 9 months ago

Cc: Jacob Walls added
Summary: migration plan in the context of squashmigrations has too many leavesBackwards migration to replaced migration when other app has squashed migrations can lead to FieldDoesNotExist error due to incorrect state
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Agreed this can be treated separate to #36166 assuming we want to support backwards migrating to replaced migrations

Version 0, edited 9 months ago by Sarah Boyce (next)

comment:6 by Sarah Boyce, 9 months ago

Resolution: needsinfo
Status: closednew

comment:7 by Jacob Walls, 8 months ago

Thanks for the ping on github. I agree some investigation is needed about whether & where to unset replace_migrations = False back to True.

Have you looked into pre-walking the graph nodes to find out if any migrations are missing, so we could reload with replace_migrations = False *before* the loop? I didn't get a chance to do a deeper dive than that.

comment:8 by Klaas van Schelven, 8 months ago

I'm not sure what you mean exactly... in particular with "pre walking". Also I'm not sure what you mean with "the loop"; the whole point is that there are 2 usage locations (and that in certain conditions they need different values for replace_migrations).

I think the graph is cached (and this caching is overridden for the case I highlighted). In my mind, as long as that "clobbering" is done after the function that needs it returns, we're probably good. In my PoC I did that by storing the unclobbered result, but any other solution would do to, I think.

comment:9 by Jacob Walls, 8 months ago

Hi Klaas. Sorry if my shorthand was not helpful. What I meant was that I tested your suggested patch against your test project and found that I still couldn't migrate backward to 0001_initial, finding instead this exception:

  File "/Users/.../django/django/db/migrations/executor.py", line 229, in _migrate_all_backwards
    self.unapply_migration(states[migration], migration, fake=fake)
                           ~~~~~~^^^^^^^^^^^
KeyError: <Migration triggerfailingcode.0002_baz_baz>

Because your patch (and the changes in #24900) take place inside a loop over migration targets, by "pre-walking" I wondered if we could avoid rebuilding the graph during the loop and instead just derive whatever information we need to build the graph correctly only once, but I wasn't saying I had confidence in this hypothesis, I was just asking if you had already explored it. From the PR review comments in #24900 it appears the original version was quite different and perhaps more like what I'm suggesting now. (It's hard to say since I don't have the branch anymore.)

Thanks for looking into it, I'd be glad to review a PR.

comment:10 by houston0222, 4 months ago

Owner: set to houston0222
Status: newassigned

comment:11 by houston0222, 4 months ago

Has patch: set

comment:12 by houston0222, 4 months ago

Hi everyone,

I've submitted a pull request to address this issue:
https://github.com/django/django/pull/19612

It adds a conditional check to avoid reapplying replaced migrations when migrating backwards, which can lead to duplicate operations.
Feedback welcome:)

comment:13 by houston0222, 4 months ago

Just a quick update: I'm preparing a small example project that shows the issue, explains how it was found and debugged, and helps confirm that the fix works.

The goal is to make the problem and the review process clearer for anyone looking at the patch.

I’ll follow up with more details once it’s ready. Thank you for your time.

in reply to:  13 comment:14 by houston0222, 4 months ago

Replying to houston0222:

Just a quick update: I'm preparing a small example project that shows the issue, explains how it was found and debugged, and helps confirm that the fix works.

The goal is to make the problem and the review process clearer for anyone looking at the patch.

I’ll follow up with more details once it’s ready. Thank you for your time.

Hi, all
I’ve written an article explaining how the issue was identified and how the proposed logic resolves it
https://dev.to/houston_wong_78b96dd3a773/fixing-django-squashed-migration-in-multi-app-issue-36168-5g39
the new PR: https://github.com/django/django/pull/19623

Thank you!

comment:15 by Jacob Walls, 4 weeks ago

Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top