Opened 6 years ago

Closed 4 years ago

#25530 closed Bug (fixed)

Deferred SQL operations fail when a referenced table or column is renamed or deleted during the same migration

Reported by: Simon Philips Owned by: Simon Charette
Component: Migrations Version: dev
Severity: Normal Keywords:
Cc: pdewacht@…, Shai Berger Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Simon Philips)

Possibly related to #25521

The following migration is the result of squashing 3 smaller migrations together:

  1. the initial migration
  2. some random RunPython migration
  3. changing the foreign key's column migration

The second step is only necessary to prevent squashmigrations from being smart which avoids the issue.

The squashed migration (both optimized and unoptimized) looks as following:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

from django.db import migrations, models


# bug.migrations.0002_run_python

def do_something(apps, schema_editor):
    pass  # Do nothing, it's not important anyway.


class Migration(migrations.Migration):

    replaces = [('bug', '0001_initial'), ('bug', '0002_run_python'), ('bug', '0003_change_db_column')]

    dependencies = [
    ]

    operations = [
        migrations.CreateModel(
            name='Child',
            fields=[
                ('id', models.AutoField(auto_created=True, serialize=False, verbose_name='ID', primary_key=True)),
                ('data', models.IntegerField(default=0)),
            ],
        ),
        migrations.CreateModel(
            name='Parent',
            fields=[
                ('id', models.AutoField(auto_created=True, serialize=False, verbose_name='ID', primary_key=True)),
                ('data', models.IntegerField(default=0)),
            ],
        ),
        migrations.AddField(
            model_name='child',
            name='parent',
            field=models.ForeignKey(db_column='my_parent', to='bug.Parent'),
        ),
        migrations.RunPython(
            code=do_something,
        ),
        migrations.AlterField(
            model_name='child',
            name='parent',
            field=models.ForeignKey(to='bug.Parent'),
        ),
    ]

The models are:

class Parent(models.Model):
    data = models.IntegerField(default=0)


class Child(models.Model):
    # Original db_column:
    # parent = models.ForeignKey(Parent, db_column="my_parent")

    # New db_column:
    parent = models.ForeignKey(Parent)
    data = models.IntegerField(default=0)

This will result in the following PostgreSQL error:

2015-10-08 15:00:27 CEST ERROR:  column "my_parent" does not exist
2015-10-08 15:00:27 CEST STATEMENT:  CREATE INDEX "bug_child_1e28668d" ON "bug_child" ("my_parent")

The reason seems to be that any foreign key operations are added in SQL format to the schema_editor.deferred_sql list and are then executed after all other SQL commands. However, the third operation already renamed the column for that index to 'parent_id'.

The deferred statements should follow any changes made during further operations. Or, if that's not possible, perhaps some sort of 'insert deferred statements now' operation could be added after each 'submigration' in the squashed migration? In fact, that may even be desired, since the RunPython submigration may depend on the foreign key's behaviour/index for speed.

Note that in this sample app it already crashes at the CREATE INDEX statement, but the statement after that is ALTER TABLE "bug_child" ADD CONSTRAINT "bug_child_my_parent_3b5bc08e1603165f_fk_bug_parent_id" FOREIGN KEY ("my_parent") REFERENCES "bug_parent" ("id") DEFERRABLE INITIALLY DEFERRED. I think the same issue might occur if bug_parent(id) would be renamed or removed.

Change History (25)

comment:1 Changed 6 years ago by Peter De Wachter

Cc: pdewacht@… added

comment:2 Changed 6 years ago by Simon Philips

Description: modified (diff)

comment:3 Changed 6 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

I don't know how "smart" we can make squashmigrations. The solution may have to be "don't squash migrations that contains operations that must remain in separate migrations."

comment:4 Changed 6 years ago by Simon Philips

I don't think squashmigrations really is to blame here. The migration process itself gets stuck over valid input that could have been written manually as the order of these operations makes perfect sense from a coder's point of view. What is the reason for delaying these SQL statements until after all operations have run? Why are they not run right after each operation?

comment:5 Changed 6 years ago by Tim Graham

It might be related to 307acc745a4e655c35db96f96ceb4b87597dee49 (#24630), but this is not an area of expertise for me. If you have a solution to suggest, please do. It seems maybe you want each operation to be run in a transaction, however, my understanding is that this won't allow the entire migration to be atomic in case one of its operations fails.

comment:6 in reply to:  4 Changed 6 years ago by Shai Berger

Cc: Shai Berger added

Replying to simonphilips:

What is the reason for delaying these SQL statements until after all operations have run? Why are they not run right after each operation?

Think about a migration which adds two models to an app: a model A, and a model B with a FK to A. If A is created first, there's no problem, but if B is created first, then you can't add its foreign-key constraint until A is created. So, to run the statements immediately with the operation, you'd have to trace such dependencies and make sure models are created in the right order; this would be doable in the migration generator, but a bit of a chore if you write migrations by hand.

Now consider a case where there is a "loop" of FKs (A has a FK to B as well); there is no order of creating the models that will work -- you must separate out the constraint creation statements.

Since the "loop" case is valid, and a solution for it already handles the cases that would be solved by tracing FK dependencies, the solution was adopted across the board: Some statements are generated as "deferred" and are only executed at the end of the migration.

With that explanation in mind, your suggestion in the ticket -- to add an operation to "flush" the deferred statements, and put it in between original migrations when squashing, makes a lot of sense. There is a problem with that: Such an operation could seriously mess with the optimizer. A possible solution would be to insert the operation only before Run{Python,SQL} operations which block the optimizer anyway.

Adding an operation (whether implicitly or explicitly) would count as a new feature, and either way the problem is not a regression nor a major failure in a new feature in 1.8. If a solution can be found without new behavior, it might be accepted into 1.9, otherwise, it can only go in 1.10.

comment:7 Changed 6 years ago by Jacek Bzdak

I could attempt to write code of the flush operation (I have an issue that kind of, sort of is related but solution is the same). But this is not an EasyPickling task so I figured to ask for permssion.

For me this issue blocks, makes squashed migrations unusable, which is ok when I have 30 or so migrations in an app, but before django 1.10 is released problem might get out of hands.

comment:8 Changed 5 years ago by Tim Graham

I closed #25577 as a duplicate as it seems to be the same underlying issue.

comment:9 Changed 5 years ago by Simon Charette

Summary: Deferred foreign keys operations fail when the column is changed during the same migrationDeferred SQL operations fail when a referenced table or column is renamed during the same migration

#26461 was a duplicate caused by an implicit table rename through a RenameModel operation.

I think a possible solution would be to add state dependency tracking to deferred_sql. This way RenameModel, AlterField, RenameField and AlterModelTable operations could alter it to rename the references and RemoveModel, RemoveField operations could make sure to remove the unnecessary statements.

For example, the AddField operation above would add a tuple of the form ('CREATE INDEX ...', [(ModelState('bug', 'child', [...]), 'child'), (ModelState('bug_parent', 'parent', [...]), 'id)] (as it depends on both state and fields) and the AlterField operation would loop over all editor.deferred_sql and make sure to replace both the SQL and the state of items depending on the model state it's about to alter. Model level operations (RenameModel, AlterModelTable) could simply use None as field to denote they depend on the model state itself.

As for the SQL replacing part I think it would make more sense to introduce a class encapsulating the statement format string and its dependency than naively performing str.replace('old_table_name', 'new_table_name') on the constructed SQL. This wrapper could also subclass str in order to maintain backward compatibility with the actual deferred_sql API.

Thoughts?

comment:10 Changed 5 years ago by Simon Charette

Has patch: set
Version: 1.8master

Here's a POC of what I had in mind.

comment:11 Changed 5 years ago by Simon Charette

Needs tests: set
Patch needs improvement: set

comment:12 Changed 5 years ago by Simon Charette

Needs tests: unset
Patch needs improvement: unset

comment:13 Changed 5 years ago by Tim Graham

Patch needs improvement: set

Left some comments for improvement.

comment:14 Changed 5 years ago by Tim Graham

I closed #27092 as a duplicate.

comment:15 Changed 4 years ago by Simon Charette

Owner: changed from nobody to Simon Charette
Status: newassigned

comment:16 Changed 4 years ago by Simon Charette

Summary: Deferred SQL operations fail when a referenced table or column is renamed during the same migrationDeferred SQL operations fail when a referenced table or column is renamed or deleted during the same migration

comment:17 Changed 4 years ago by Simon Charette

Patch needs improvement: unset

Adjusted the patch based on Tim's feedback and rebased on top of the latest master.

comment:18 Changed 4 years ago by Tim Graham

Patch needs improvement: set

comment:19 Changed 4 years ago by Simon Charette

Patch needs improvement: unset

comment:20 Changed 4 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:21 Changed 4 years ago by Simon Charette <charette.s@…>

In ea91ad4c:

Refs #25530 -- Changed _create_index_name to take a table as first parameter.

comment:22 Changed 4 years ago by Simon Charette <charette.s@…>

In 3b429c96:

Refs #25530 -- Tracked references of deferred SQL statements.

comment:23 Changed 4 years ago by Simon Charette <charette.s@…>

In b50815ee:

Refs #25530 -- Renamed deferred SQL references on rename operation.

comment:24 Changed 4 years ago by Simon Charette <charette.s@…>

In b1cbbe92:

Refs #25530 -- Deleted deferred SQL references on delete operation.

comment:25 Changed 4 years ago by Simon Charette

Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.
Back to Top