Opened 10 days ago

Last modified 10 minutes ago

#36168 new Bug

Backwards migration to replaced migration when other app has squashed migrations can lead to FieldDoesNotExist error due to incorrect state

Reported by: Klaas van Schelven Owned by:
Component: Migrations Version: 5.1
Severity: Normal Keywords:
Cc: Jacob Walls Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no
Pull Requests:How to create a pull request

Description

I managed to debug my way to the broken part of the code, but did not manage to find a nice clean reproduction. Hence the following report starts with broken code and only then explains the problem, which is the unusual order.

here's the broken code

This is wrong because it mutates the state on the MigrationExecutor, i.e. in certain conditions a different graph (without the pruning of replace_migrations) is set "globally" on the MigrationExecutor.

But migration_plan (the function of which the broken code is part) is called _twice_ in at least some flows. In particular: first as part of migrate, and then as part of _create_project_state. This means that, for flows where the broken code is called in the first call to migration_plan (for presumably good reasons, as per the comment above it), the construction of the second plan uses a graph that is too large, causing a failure in some cases.

This fails in the following combination of conditions:

  1. a project with squashed migrations in more than one app
  2. where the squashed migrations do not have follow-up migrations (i.e. when the graph is not simplified for replacements, both the squashing migration and the last migration it replaces show up as leaf nodes)
  3. explicit migration to a replaced migration from the command line.

because of the explicit migration to a replaced migration (3) the graph is updated in the "wrong code" as per the comment. Then, when trying to _create_project_state the graph contains leaf nodes for both the original and squashed paths, which means that some things in the project-state-creation happen doubly, which fails.

I have encountered this while working on Bugsink, a self-hosted Error Tracker, on this commit. I have not been able to distill a more clean PoC that actually exhibits failure though.

Replacing the mutation with something that leaves the loader untouched (ugly version below) fixes the problem though.

                    old_loader = self.loader
                    self.loader = MigrationLoader(self.connection)
                    self.loader.replace_migrations = False
                    self.loader.build_graph()
                    result = self.migration_plan(targets, clean_start=clean_start)
                    self.loader = old_loader
                    return result

this ticket may or may not be related

According to the ticket's flags, the next step(s) to move this issue forward are:

  • To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is: [https://github.com/django/django/pull/#### PR].

Change History (8)

comment:1 by Sarah Boyce, 10 days ago

Resolution: needsinfo
Status: newclosed

Hi Klaas, thank you for your efforts into debugging further into the migrations
I will close this until there is either a test to demonstrate undesired behavior or a project to replicate (showing an issue different to #36166)
I would avoid going too low-level when creating a ticket as we can go into this depth in PRs/ticket comments when resolving issues

comment:2 by Klaas van Schelven, 10 days ago

ok, finally found a clean reproducer!

https://github.com/vanschelven/squashwithrename

(freshdjango) klaas@pop-os:~/dev/squashwithrename$ rm db.sqlite3 
(freshdjango) klaas@pop-os:~/dev/squashwithrename$ python manage.py migrate 
Operations to perform:
  Apply all migrations: admin, auth, contenttypes, sessions, squashme, triggerfailingcode
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying auth.0012_alter_user_first_name_max_length... OK
  Applying sessions.0001_initial... OK
  Applying squashme.0001_initial... OK
  Applying squashme.0002_rename_name_foo_rename_squashed_0003_foo_another_field... OK
  Applying triggerfailingcode.0001_squashed_0002_baz_baz... OK
(freshdjango) klaas@pop-os:~/dev/squashwithrename$ python manage.py migrate  triggerfailingcode 0001_initial
Operations to perform:
  Target specific migration: 0001_initial, from triggerfailingcode
Traceback (most recent call last):
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/state.py", line 297, in rename_field
    found = fields.pop(old_name)
KeyError: 'name'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/datacrypt/dev/squashwithrename/manage.py", line 22, in <module>
    main()
  File "/mnt/datacrypt/dev/squashwithrename/manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/__init__.py", line 442, in execute_from_command_line
    utility.execute()
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/__init__.py", line 436, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 413, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 459, in execute
    output = self.handle(*args, **options)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/base.py", line 107, in wrapper
    res = handle_func(*args, **kwargs)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 302, in handle
    pre_migrate_state = executor._create_project_state(with_applied_migrations=True)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/executor.py", line 91, in _create_project_state
    migration.mutate_state(state, preserve=False)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/migration.py", line 91, in mutate_state
    operation.state_forwards(self.app_label, new_state)
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/operations/fields.py", line 303, in state_forwards
    state.rename_field(
  File "/tmp/freshdjango/lib/python3.10/site-packages/django/db/migrations/state.py", line 299, in rename_field
    raise FieldDoesNotExist(
django.core.exceptions.FieldDoesNotExist: squashme.foo has no field named 'name'

comment:3 by Klaas van Schelven, 10 days ago

The reason this was somewhat hard to reproduce cleanly is the following:

Per my original report:

the graph contains leaf nodes for both the original and squashed paths, which means that some things in the project-state-creation happen doubly, which fails.

however, there's also this code, which is the adding of a model to a project's state. That code does not bother to check whether a given model already exist, which means that the things happening doubly doesn't trigger an error if both paths start with creating a model, for the simple reason that the model is simply overwritten.

That might actually be a separate bug...

comment:4 by Sarah Boyce, 10 days ago

I've downloaded the project and ran the commands. It migrates without error for me on both Django 5.1 and main 🤔

Apologies, missed the second command to run, reproduced

Last edited 10 days ago by Sarah Boyce (previous) (diff)

comment:5 by Sarah Boyce, 10 days ago

Cc: Jacob Walls added
Summary: migration plan in the context of squashmigrations has too many leavesBackwards migration to replaced migration when other app has squashed migrations can lead to FieldDoesNotExist error due to incorrect state
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Agreed this can be treated separate to #36166 assuming we want to support backwards migrating to replaced migrations

Thank you for the projects Klaas, really appreciate it ⭐

Last edited 10 days ago by Sarah Boyce (previous) (diff)

comment:6 by Sarah Boyce, 10 days ago

Resolution: needsinfo
Status: closednew

comment:7 by Jacob Walls, 16 hours ago

Thanks for the ping on github. I agree some investigation is needed about whether & where to unset replace_migrations = False back to True.

Have you looked into pre-walking the graph nodes to find out if any migrations are missing, so we could reload with replace_migrations = False *before* the loop? I didn't get a chance to do a deeper dive than that.

comment:8 by Klaas van Schelven, 10 minutes ago

I'm not sure what you mean exactly... in particular with "pre walking". Also I'm not sure what you mean with "the loop"; the whole point is that there are 2 usage locations (and that in certain conditions they need different values for replace_migrations).

I think the graph is cached (and this caching is overridden for the case I highlighted). In my mind, as long as that "clobbering" is done after the function that needs it returns, we're probably good. In my PoC I did that by storing the unclobbered result, but any other solution would do to, I think.

Note: See TracTickets for help on using tickets.
Back to Top