Opened 6 years ago
Last modified 20 months ago
#31255 new Cleanup/optimization
Migrations create a redundant RemoveField operation when deleting 2 models with related fields.
| Reported by: | Panagis Alisandratos | Owned by: | |
|---|---|---|---|
| Component: | Migrations | Version: | dev |
| Severity: | Normal | Keywords: | migration, optimizer |
| Cc: | Markus Holtermann, Andriy Sokolovskiy, Shai Berger, Ülgen Sarıkavak | Triage Stage: | Accepted |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
class ModelA(models.Model):
field1 = models.BooleanField(default=False)
class ModelB(models.Model):
another_field1 = models.TextField(blank=True)
field1 = models.BooleanField(default=False)
related_field = models.ForeignKey(ModelA, on_delete=models.CASCADE)
class ModelC(models.Model):
another_field1 = models.TextField(blank=True)
related_field = models.ForeignKey(ModelA, on_delete=models.CASCADE)
Removing ModelB and ModelC yields the following migration:
class Migration(migrations.Migration):
dependencies = [
('analyzer', '0149_modela_modelb_modelc'),
]
operations = [
migrations.RemoveField(
model_name='modelc',
name='related_field',
),
migrations.DeleteModel(
name='ModelB',
),
migrations.DeleteModel(
name='ModelC',
),
]
It is unclear whether the RemoveField operation is redundant or if a RemoveField operation for ModelB is missing.
I've confirmed that the RemoveField operation for ModelB is part of the unoptimized operations set.
Change History (22)
comment:1 by , 6 years ago
| Summary: | Migration optimizer retains a 1 RemoveField for FK when deleting 2 models → Migration optimizer retains 1 RemoveField for FK when deleting 2 models |
|---|
comment:2 by , 6 years ago
| Summary: | Migration optimizer retains 1 RemoveField for FK when deleting 2 models → Migration optimizer removes 1 RemoveField for FK when deleting 2 models |
|---|
comment:3 by , 6 years ago
| Cc: | added |
|---|---|
| Summary: | Migration optimizer removes 1 RemoveField for FK when deleting 2 models → Migrations create a redundant RemoveField operation when deleting 2 models with related fields. |
| Triage Stage: | Unreviewed → Accepted |
| Version: | 2.2 → master |
comment:4 by , 6 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:5 by , 6 years ago
Upon going through some of the source code, the issue clearly arises because we are calling RemoveField in the generate_deleted_models method in db/migrations/autodetector.py https://github.com/django/django/blob/f283ffaa84ef0a558eb466b8fc3fae7e6fbb547c/django/db/migrations/autodetector.py#L707. I think calling this method here is redundant as DeleteModel should drop the table from the database which automatically deletes the related fields as well. Can someone confirm my hypothesis or correct me if I am steering in the wrong direction?
comment:6 by , 6 years ago
Sankar it's bit more complicated than that.
If you remove two interelated models (or any model cycle) at the same time fields have to be removed to break the cycle before hand
Given
class Foo(models.Model): bar = models.ForeignKey('Bar') class Bar(models.Model): foo = models.ForeignKey(Foo)
Removing both models and running makemigrations must generate a migration that either drop Foo.bar or Bar.foo before proceeding with the model removal. I'm pretty sure we have auto-detector tests for this case.
The optimizer likely need to be adjusted somehow but I doubt generate_deleted_models's generation of RemoveField is to blame.
comment:7 by , 6 years ago
Upon doing some more debugging, I found some peculiar behaviour. I tried to print generated_operations at the end of the function generate_deleted_models's . This is the output: {'some': [<RemoveField model_name='book', name='authors_book'>, <DeleteModel name='Book'>, <RemoveField model_name='pub', name='authors_pub'>, <DeleteModel name='Pub'>]}.
The method is adding four operations to the generated_operations, but only three of them show up in the actual migrations file. Am I missing something here?
comment:8 by , 6 years ago
The method is adding four operations to the generated_operations, but only three of them show up in the actual migrations file. Am I missing something here?
Yes. The optimizer is likely able to optimize one of the RemoveField into its corresponding DeleteModel but not the other one. The reduction operation takes place in RemoveField.reduce.
See #26720 (https://github.com/django/django/pull/7999) and https://code.djangoproject.com/ticket/24424#comment:35 for more details about that.
comment:9 by , 6 years ago
Unfortunately, I don't think this is the case here. Here is the migration file that was generated: https://pastebin.com/nSN5HNW8
comment:10 by , 6 years ago
I'm not sure I understand, the migration file you pointed at clearly has 3 operations so the <RemoveField model_name='book', name='authors_book'> got reduced into the <DeleteModel name='Book'>.
comment:11 by , 6 years ago
Forgive me if I come off as a bit unaware, I have just started to work with the migrations component. I have been going through the code in the meanwhile, and I am confident that the reduce() method or MigrationOptimizer's optimize_inner() method is causing the issue. Just to be on the same page, the issue here is that while one RemoveField gets reduced into DeleteModel, while the other does not, right?
comment:12 by , 6 years ago
This is explained in more details in the above links but the crux of the issue is that RemoveField and DeleteModel operations don't hold a reference to what they are removing so they must assume they could be referring to any model.
In other words, because the DeleteModel operation doesn't hold a reference to the model state it's removing it cannot differentiate between the cases where the optimization can be performed (this ticket's example) and when it cannot because there's cycles between models (comment:6).
You likely can get the optimization working for this ticket by changing DeleteModel.references_model to return False but that I'll break when for the cases where model cycles are deleted in the same migration.
If DeleteModel operations were created with the set of fields they have at removal time it would be possible for them to properly determine whether or not they references each other.
follow-up: 14 comment:13 by , 6 years ago
Here's a branch pushing the idea above further.
It currently has two failures related to many-to-many field removals optimizations in the auto-detector but the additional tests in test_optimizer demonstrate that changes work as expected. We should likely add more tests for the RemoveField and DeleteModel without fields(s) tests but I think this should be a good basis if you want to familiarize yourself with the internals of the optimizer.
comment:14 by , 6 years ago
Replying to Simon Charette:
After going through this, I have built on top of this and further fixed the ManyToManyField removal test cases and added another test case for this ticket as well. How should I create the PR, given that I have built on top of your existing code?
comment:15 by , 6 years ago
After going through this, I have built on top of this and further fixed the ManyToManyField removal test cases and added another test case for this ticket as well
Nice, I think test_non_cyclical_models_removals already covers this ticket's case though; no need to add another one at the auto-detector level IMO.
How should I create the PR, given that I have built on top of your existing code?
Assuming you pushed your work to a branch that includes the above commit it should be the same process as opening any other PR. I'd leave the two commits separated so they can be reorganized by the mergers as they deem the most appropriate.
comment:16 by , 6 years ago
While working on my branch, I manually changed all the files to reflect the changes you made on your branch, and then made some further changes on my own. I was wondering if there is a way, that the megers recognize that the PR contains your work too since if I simply create a PR, you wouldn't get any credit.
comment:17 by , 6 years ago
I wouldn't worry too much about attribution, mergers can use the Git Co-authored-by feature to give attribution to multiple contributors.
comment:18 by , 6 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
Sanskar Jaiswal: Please feel free to pick this
comment:19 by , 5 years ago
| Has patch: | set |
|---|---|
| Owner: | set to |
| Status: | new → assigned |
Submitted a draft PR.
I'd like to try a shot at an alternative patch that changes Operation.reduce signature to (operation: Operation, app_label: str, state: ProjectState) to allow the optimization to be enabled even for legacy projects with migrations that haven't tracked removed field and models over time. It would also avoid a public API change for RemoveField and DeleteModel.
comment:20 by , 5 years ago
| Has patch: | unset |
|---|
comment:21 by , 4 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
comment:22 by , 20 months ago
| Cc: | added |
|---|
IMO,
RemoveFieldis redundant in this case (see #24061 with a related discussion).