Opened 6 months ago

Last modified 5 months ago

#31255 new Cleanup/optimization

Migrations create a redundant RemoveField operation when deleting 2 models with related fields.

Reported by: Panagis Alisandratos Owned by:
Component: Migrations Version: master
Severity: Normal Keywords: migration, optimizer
Cc: Markus Holtermann, Andriy Sokolovskiy, Shai Berger Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

class ModelA(models.Model):
    field1 = models.BooleanField(default=False)

class ModelB(models.Model):
    another_field1 = models.TextField(blank=True)
    field1 = models.BooleanField(default=False)
    related_field = models.ForeignKey(ModelA, on_delete=models.CASCADE)

class ModelC(models.Model):
    another_field1 = models.TextField(blank=True)
    related_field = models.ForeignKey(ModelA, on_delete=models.CASCADE)

Removing ModelB and ModelC yields the following migration:

class Migration(migrations.Migration):

    dependencies = [
        ('analyzer', '0149_modela_modelb_modelc'),
    ]

    operations = [
        migrations.RemoveField(
            model_name='modelc',
            name='related_field',
        ),
        migrations.DeleteModel(
            name='ModelB',
        ),
        migrations.DeleteModel(
            name='ModelC',
        ),
    ]

It is unclear whether the RemoveField operation is redundant or if a RemoveField operation for ModelB is missing.

I've confirmed that the RemoveField operation for ModelB is part of the unoptimized operations set.

Change History (18)

comment:1 Changed 6 months ago by Panagis Alisandratos

Summary: Migration optimizer retains a 1 RemoveField for FK when deleting 2 modelsMigration optimizer retains 1 RemoveField for FK when deleting 2 models

comment:2 Changed 6 months ago by Panagis Alisandratos

Summary: Migration optimizer retains 1 RemoveField for FK when deleting 2 modelsMigration optimizer removes 1 RemoveField for FK when deleting 2 models

comment:3 Changed 6 months ago by felixxm

Cc: Markus Holtermann Andriy Sokolovskiy Shai Berger added
Summary: Migration optimizer removes 1 RemoveField for FK when deleting 2 modelsMigrations create a redundant RemoveField operation when deleting 2 models with related fields.
Triage Stage: UnreviewedAccepted
Version: 2.2master

IMO, RemoveField is redundant in this case (see #24061 with a related discussion).

comment:4 Changed 6 months ago by Rohit Jha

Owner: changed from nobody to Rohit Jha
Status: newassigned

comment:5 Changed 6 months ago by Sanskar Jaiswal

Upon going through some of the source code, the issue clearly arises because we are calling RemoveField in the generate_deleted_models method in db/migrations/autodetector.py https://github.com/django/django/blob/f283ffaa84ef0a558eb466b8fc3fae7e6fbb547c/django/db/migrations/autodetector.py#L707. I think calling this method here is redundant as DeleteModel should drop the table from the database which automatically deletes the related fields as well. Can someone confirm my hypothesis or correct me if I am steering in the wrong direction?

comment:6 Changed 6 months ago by Simon Charette

Sankar it's bit more complicated than that.

If you remove two interelated models (or any model cycle) at the same time fields have to be removed to break the cycle before hand

Given

class Foo(models.Model):
    bar = models.ForeignKey('Bar')

class Bar(models.Model):
    foo = models.ForeignKey(Foo)

Removing both models and running makemigrations must generate a migration that either drop Foo.bar or Bar.foo before proceeding with the model removal. I'm pretty sure we have auto-detector tests for this case.

The optimizer likely need to be adjusted somehow but I doubt generate_deleted_models's generation of RemoveField is to blame.

comment:7 Changed 6 months ago by Sanskar Jaiswal

Upon doing some more debugging, I found some peculiar behaviour. I tried to print generated_operations at the end of the function generate_deleted_models's . This is the output: {'some': [<RemoveField model_name='book', name='authors_book'>, <DeleteModel name='Book'>, <RemoveField model_name='pub', name='authors_pub'>, <DeleteModel name='Pub'>]}.

The method is adding four operations to the generated_operations, but only three of them show up in the actual migrations file. Am I missing something here?

comment:8 Changed 6 months ago by Simon Charette

The method is adding four operations to the generated_operations, but only three of them show up in the actual migrations file. Am I missing something here?

Yes. The optimizer is likely able to optimize one of the RemoveField into its corresponding DeleteModel but not the other one. The reduction operation takes place in RemoveField.reduce.

See #26720 (https://github.com/django/django/pull/7999) and https://code.djangoproject.com/ticket/24424#comment:35 for more details about that.

comment:9 Changed 6 months ago by Sanskar Jaiswal

Unfortunately, I don't think this is the case here. Here is the migration file that was generated: https://pastebin.com/nSN5HNW8

comment:10 Changed 6 months ago by Simon Charette

I'm not sure I understand, the migration file you pointed at clearly has 3 operations so the <RemoveField model_name='book', name='authors_book'> got reduced into the <DeleteModel name='Book'>.

comment:11 Changed 6 months ago by Sanskar Jaiswal

Forgive me if I come off as a bit unaware, I have just started to work with the migrations component. I have been going through the code in the meanwhile, and I am confident that the reduce() method or MigrationOptimizer's optimize_inner() method is causing the issue. Just to be on the same page, the issue here is that while one RemoveField gets reduced into DeleteModel, while the other does not, right?

comment:12 Changed 6 months ago by Simon Charette

This is explained in more details in the above links but the crux of the issue is that RemoveField and DeleteModel operations don't hold a reference to what they are removing so they must assume they could be referring to any model.

In other words, because the DeleteModel operation doesn't hold a reference to the model state it's removing it cannot differentiate between the cases where the optimization can be performed (this ticket's example) and when it cannot because there's cycles between models (comment:6).

You likely can get the optimization working for this ticket by changing DeleteModel.references_model to return False but that I'll break when for the cases where model cycles are deleted in the same migration.

If DeleteModel operations were created with the set of fields they have at removal time it would be possible for them to properly determine whether or not they references each other.

comment:13 Changed 6 months ago by Simon Charette

Here's a branch pushing the idea above further.

It currently has two failures related to many-to-many field removals optimizations in the auto-detector but the additional tests in test_optimizer demonstrate that changes work as expected. We should likely add more tests for the RemoveField and DeleteModel without fields(s) tests but I think this should be a good basis if you want to familiarize yourself with the internals of the optimizer.

comment:14 in reply to:  13 Changed 6 months ago by Sanskar Jaiswal

Replying to Simon Charette:

Here's a branch pushing the idea above further.

After going through this, I have built on top of this and further fixed the ManyToManyField removal test cases and added another test case for this ticket as well. How should I create the PR, given that I have built on top of your existing code?

Last edited 6 months ago by Sanskar Jaiswal (previous) (diff)

comment:15 Changed 6 months ago by Simon Charette

After going through this, I have built on top of this and further fixed the ManyToManyField removal test cases and added another test case for this ticket as well

Nice, I think test_non_cyclical_models_removals already covers this ticket's case though; no need to add another one at the auto-detector level IMO.

How should I create the PR, given that I have built on top of your existing code?

Assuming you pushed your work to a branch that includes the above commit it should be the same process as opening any other PR. I'd leave the two commits separated so they can be reorganized by the mergers as they deem the most appropriate.

comment:16 Changed 6 months ago by Sanskar Jaiswal

While working on my branch, I manually changed all the files to reflect the changes you made on your branch, and then made some further changes on my own. I was wondering if there is a way, that the megers recognize that the PR contains your work too since if I simply create a PR, you wouldn't get any credit.

Last edited 6 months ago by Sanskar Jaiswal (previous) (diff)

comment:17 Changed 6 months ago by Simon Charette

I wouldn't worry too much about attribution, mergers can use the Git Co-authored-by feature to give attribution to multiple contributors.

comment:18 Changed 5 months ago by Rohit Jha

Owner: Rohit Jha deleted
Status: assignednew

Sanskar Jaiswal: Please feel free to pick this

Note: See TracTickets for help on using tickets.
Back to Top