Opened 4 years ago

Last modified 11 days ago

#31255 new Cleanup/optimization

Migrations create a redundant RemoveField operation when deleting 2 models with related fields.

Reported by: Panagis Alisandratos Owned by:
Component: Migrations Version: dev
Severity: Normal Keywords: migration, optimizer
Cc: Markus Holtermann, Andriy Sokolovskiy, Shai Berger, Ülgen Sarıkavak Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

class ModelA(models.Model):
    field1 = models.BooleanField(default=False)

class ModelB(models.Model):
    another_field1 = models.TextField(blank=True)
    field1 = models.BooleanField(default=False)
    related_field = models.ForeignKey(ModelA, on_delete=models.CASCADE)

class ModelC(models.Model):
    another_field1 = models.TextField(blank=True)
    related_field = models.ForeignKey(ModelA, on_delete=models.CASCADE)

Removing ModelB and ModelC yields the following migration:

class Migration(migrations.Migration):

    dependencies = [
        ('analyzer', '0149_modela_modelb_modelc'),
    ]

    operations = [
        migrations.RemoveField(
            model_name='modelc',
            name='related_field',
        ),
        migrations.DeleteModel(
            name='ModelB',
        ),
        migrations.DeleteModel(
            name='ModelC',
        ),
    ]

It is unclear whether the RemoveField operation is redundant or if a RemoveField operation for ModelB is missing.

I've confirmed that the RemoveField operation for ModelB is part of the unoptimized operations set.

Change History (22)

comment:1 by Panagis Alisandratos, 4 years ago

Summary: Migration optimizer retains a 1 RemoveField for FK when deleting 2 modelsMigration optimizer retains 1 RemoveField for FK when deleting 2 models

comment:2 by Panagis Alisandratos, 4 years ago

Summary: Migration optimizer retains 1 RemoveField for FK when deleting 2 modelsMigration optimizer removes 1 RemoveField for FK when deleting 2 models

comment:3 by Mariusz Felisiak, 4 years ago

Cc: Markus Holtermann Andriy Sokolovskiy Shai Berger added
Summary: Migration optimizer removes 1 RemoveField for FK when deleting 2 modelsMigrations create a redundant RemoveField operation when deleting 2 models with related fields.
Triage Stage: UnreviewedAccepted
Version: 2.2master

IMO, RemoveField is redundant in this case (see #24061 with a related discussion).

comment:4 by Rohit Jha, 4 years ago

Owner: changed from nobody to Rohit Jha
Status: newassigned

comment:5 by Sanskar Jaiswal, 4 years ago

Upon going through some of the source code, the issue clearly arises because we are calling RemoveField in the generate_deleted_models method in db/migrations/autodetector.py https://github.com/django/django/blob/f283ffaa84ef0a558eb466b8fc3fae7e6fbb547c/django/db/migrations/autodetector.py#L707. I think calling this method here is redundant as DeleteModel should drop the table from the database which automatically deletes the related fields as well. Can someone confirm my hypothesis or correct me if I am steering in the wrong direction?

comment:6 by Simon Charette, 4 years ago

Sankar it's bit more complicated than that.

If you remove two interelated models (or any model cycle) at the same time fields have to be removed to break the cycle before hand

Given

class Foo(models.Model):
    bar = models.ForeignKey('Bar')

class Bar(models.Model):
    foo = models.ForeignKey(Foo)

Removing both models and running makemigrations must generate a migration that either drop Foo.bar or Bar.foo before proceeding with the model removal. I'm pretty sure we have auto-detector tests for this case.

The optimizer likely need to be adjusted somehow but I doubt generate_deleted_models's generation of RemoveField is to blame.

comment:7 by Sanskar Jaiswal, 4 years ago

Upon doing some more debugging, I found some peculiar behaviour. I tried to print generated_operations at the end of the function generate_deleted_models's . This is the output: {'some': [<RemoveField model_name='book', name='authors_book'>, <DeleteModel name='Book'>, <RemoveField model_name='pub', name='authors_pub'>, <DeleteModel name='Pub'>]}.

The method is adding four operations to the generated_operations, but only three of them show up in the actual migrations file. Am I missing something here?

comment:8 by Simon Charette, 4 years ago

The method is adding four operations to the generated_operations, but only three of them show up in the actual migrations file. Am I missing something here?

Yes. The optimizer is likely able to optimize one of the RemoveField into its corresponding DeleteModel but not the other one. The reduction operation takes place in RemoveField.reduce.

See #26720 (https://github.com/django/django/pull/7999) and https://code.djangoproject.com/ticket/24424#comment:35 for more details about that.

comment:9 by Sanskar Jaiswal, 4 years ago

Unfortunately, I don't think this is the case here. Here is the migration file that was generated: https://pastebin.com/nSN5HNW8

comment:10 by Simon Charette, 4 years ago

I'm not sure I understand, the migration file you pointed at clearly has 3 operations so the <RemoveField model_name='book', name='authors_book'> got reduced into the <DeleteModel name='Book'>.

comment:11 by Sanskar Jaiswal, 4 years ago

Forgive me if I come off as a bit unaware, I have just started to work with the migrations component. I have been going through the code in the meanwhile, and I am confident that the reduce() method or MigrationOptimizer's optimize_inner() method is causing the issue. Just to be on the same page, the issue here is that while one RemoveField gets reduced into DeleteModel, while the other does not, right?

comment:12 by Simon Charette, 4 years ago

This is explained in more details in the above links but the crux of the issue is that RemoveField and DeleteModel operations don't hold a reference to what they are removing so they must assume they could be referring to any model.

In other words, because the DeleteModel operation doesn't hold a reference to the model state it's removing it cannot differentiate between the cases where the optimization can be performed (this ticket's example) and when it cannot because there's cycles between models (comment:6).

You likely can get the optimization working for this ticket by changing DeleteModel.references_model to return False but that I'll break when for the cases where model cycles are deleted in the same migration.

If DeleteModel operations were created with the set of fields they have at removal time it would be possible for them to properly determine whether or not they references each other.

comment:13 by Simon Charette, 4 years ago

Here's a branch pushing the idea above further.

It currently has two failures related to many-to-many field removals optimizations in the auto-detector but the additional tests in test_optimizer demonstrate that changes work as expected. We should likely add more tests for the RemoveField and DeleteModel without fields(s) tests but I think this should be a good basis if you want to familiarize yourself with the internals of the optimizer.

in reply to:  13 comment:14 by Sanskar Jaiswal, 4 years ago

Replying to Simon Charette:

Here's a branch pushing the idea above further.

After going through this, I have built on top of this and further fixed the ManyToManyField removal test cases and added another test case for this ticket as well. How should I create the PR, given that I have built on top of your existing code.

Version 0, edited 4 years ago by Sanskar Jaiswal (next)

comment:15 by Simon Charette, 4 years ago

After going through this, I have built on top of this and further fixed the ManyToManyField removal test cases and added another test case for this ticket as well

Nice, I think test_non_cyclical_models_removals already covers this ticket's case though; no need to add another one at the auto-detector level IMO.

How should I create the PR, given that I have built on top of your existing code?

Assuming you pushed your work to a branch that includes the above commit it should be the same process as opening any other PR. I'd leave the two commits separated so they can be reorganized by the mergers as they deem the most appropriate.

comment:16 by Sanskar Jaiswal, 4 years ago

While working on my branch, I manually changed all the files to reflect the changes you made on your branch, and then made some further changes on my own. I was wondering if there is a way, that the megers recognize that the PR contains your work too since if I simply create a PR, you wouldn't get any credit.

Last edited 4 years ago by Sanskar Jaiswal (previous) (diff)

comment:17 by Simon Charette, 4 years ago

I wouldn't worry too much about attribution, mergers can use the Git Co-authored-by feature to give attribution to multiple contributors.

comment:18 by Rohit Jha, 4 years ago

Owner: Rohit Jha removed
Status: assignednew

Sanskar Jaiswal: Please feel free to pick this

comment:19 by Simon Charette, 3 years ago

Has patch: set
Owner: set to Simon Charette
Status: newassigned

Submitted a draft PR.

I'd like to try a shot at an alternative patch that changes Operation.reduce signature to (operation: Operation, app_label: str, state: ProjectState) to allow the optimization to be enabled even for legacy projects with migrations that haven't tracked removed field and models over time. It would also avoid a public API change for RemoveField and DeleteModel.

comment:20 by Mariusz Felisiak, 3 years ago

Has patch: unset

comment:21 by Mariusz Felisiak, 2 years ago

Owner: Simon Charette removed
Status: assignednew

comment:22 by Ülgen Sarıkavak, 11 days ago

Cc: Ülgen Sarıkavak added
Note: See TracTickets for help on using tickets.
Back to Top