Opened 3 years ago

Closed 2 years ago

#30966 closed Bug (fixed)

Migration crashes due to foreign key issue, depending on otherwise irrelevant order on MySQL.

Reported by: Peter Thomassen Owned by: Hasan Ramezani
Component: Migrations Version: dev
Severity: Normal Keywords: migration mySQL
Cc: Simon Charette, Markus Holtermann Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In my Django app "desecapi" (reduced to a minimal example), I have two migrations. The first migration defines the following models:

User
+ Token (Token.user relates to User.id)
+ Domain (Domain.owner relates to User.id)
...+ RRset (RRset.domain relates to Domain.id)

... where "+" denotes foreign-key relationships. Note that RRset is a grandchild of User.

When running the two migrations together on an empty database (i.e. running python manage.py desecapi 0002 instead of ... 0001 and ... 0002 separately), the second migration fails with:

django.db.utils.OperationalError: (1833, "Cannot change column 'id': used in a foreign key constraint 'desecapi_token_user_id_bfc397b3_fk_desecapi_user_id' of table 'desec.desecapi_token'")

When running them separately, both migrations work. I would expect that running in bulk or each on its own should not make a difference.

The issue can be fixed by switching the order of two operations in the first migration. One of these two operations is the one that creates the desecapi_token table (mentioned in the above error), and the other operation changes an attribute of the RRset.subname field. That model is not related to the Token model, and also not directly related to the User model (which is touch in the second migration).

I would not expect this ordering to influence whether the second migration can run successfully or not.

(One can say that the operation changing RRset.subname is unnecessary, as it could be incorporated in RRset's "CreateModel" operation. However, the history of encountering the problem is that RRset.subname in fact was changed in an intermediate migration. It just seems unnecessary because this minimal example is very much reduced.)

The minimal example can be found here: https://github.com/peterthomassen/django-bug/tree/master/api/desecapi
Note that in the interest of having a minimal example, models.py does not contain all models that the migrations deal with. My understanding is that that's fine (think of the missing models and fields as removed, but there is no migration yet that reflects that).

I created a docker-compose application that sets up an empty database and reproduces the problem. To reproduce, run the following:

git clone https://github.com/peterthomassen/django-bug.git
cd django-bug/
docker-compose build
docker-compose up -d dbapi # after this, wait for 2-3 minutes so that the database is up
docker-compose run api bash
# now, at the container shell, compare
python manage.py migrate desecapi 0002
# vs (make sure to clean up the database)
python manage.py migrate desecapi 0001
python manage.py migrate desecapi 0002

Attachments (3)

models.py (475 bytes) - added by Peter Thomassen 3 years ago.
0001_initial.py (1.6 KB) - added by Peter Thomassen 3 years ago.
0002_alter_user_id.py (490 bytes) - added by Peter Thomassen 3 years ago.

Download all attachments as: .zip

Change History (28)

Changed 3 years ago by Peter Thomassen

Attachment: models.py added

Changed 3 years ago by Peter Thomassen

Attachment: 0001_initial.py added

Changed 3 years ago by Peter Thomassen

Attachment: 0002_alter_user_id.py added

comment:1 Changed 3 years ago by Peter Thomassen

Summary: Migration crashes due to index issue, depending on otherwise irrelevant orderMigration crashes due to foreign key issue, depending on otherwise irrelevant order

comment:2 Changed 3 years ago by Mariusz Felisiak

Component: Database layer (models, ORM)Migrations
Keywords: mySQL added
Summary: Migration crashes due to foreign key issue, depending on otherwise irrelevant orderMigration crashes due to foreign key issue, depending on otherwise irrelevant order on MySQL.
Triage Stage: UnreviewedAccepted
Version: 2.2master

Thanks for this report, I confirmed this issue on MySQL. Scenario is really tricky, without the last operation i.e.

        migrations.AlterField(
            model_name='rrset',
            name='subname',
            field=models.CharField(blank=False, max_length=178),
        ),

everything works fine. When we take it into account, migrations don't drop a FOREIGN KEY constraint from desecapi_token.

Reproduced at 85efc14a2edac532df1a9ad4dd9b6d4a4dcf583e.

comment:3 Changed 3 years ago by Simon Charette

Cc: Simon Charette added

comment:4 Changed 3 years ago by Hasan Ramezani

Owner: changed from nobody to Hasan Ramezani
Status: newassigned

I investigate it and think the problem comes from this line:

related_objects_graph[f.remote_field.model._meta.concrete_model._meta].append(f)

of django.db.models.options.Options._populate_directed_relation_graph():

I added some log to check related_objects_graph in bothe case.
When the migration runs by ./manage.py migrate desecapi 0001 and then ./manage.py migrate desecapi 0002 (correct case), here is the result:

defaultdict(<class 'list'>, {<Options for User>: [<django.db.models.fields.related.ForeignKey: owner>, <django.db.models.fields.related.ForeignKey: user>], <Options for Domain>: [<django.db.models.fields.related.ForeignKey: domain>]})

But, When the migration runs by ./manage.py migrate desecapi 0002 (case with problem), here is the result:

defaultdict(<class 'list'>, {<Options for User>: [<django.db.models.fields.related.ForeignKey: user>], <Options for User>: [<django.db.models.fields.related.ForeignKey: owner>], <Options for Domain>: [<django.db.models.fields.related.ForeignKey: domain>]})

From the result, it is obvious that in the first case we have the <Options for User> key with a value of type list with two items
[<django.db.models.fields.related.ForeignKey: owner>, <django.db.models.fields.related.ForeignKey: user>].
which generate two alter commands:

ALTER TABLE `app1_domain` DROP FOREIGN KEY `app1_domain_owner_id_cc7262a2_fk_app1_user_id`
ALTER TABLE `app1_token` DROP FOREIGN KEY `app1_token_user_id_5ea51092_fk_app1_user_id`

But in the second case, we have two keys <Options for User> which one of them has a value of type list with one item
[<django.db.models.fields.related.ForeignKey: user>]
and the other has a value of type list with one item
[<django.db.models.fields.related.ForeignKey: owner>].

which generate just one alter command:

ALTER TABLE `app1_domain` DROP FOREIGN KEY `app1_domain_owner_id_cc7262a2_fk_app1_user_id`

Thus, the second foreign key app1_token_user_id_5ea51092_fk_app1_user_id still remains and cause the error.

Any suggestions for a fix?

comment:5 Changed 3 years ago by Hasan Ramezani

Has patch: set

PR

Postgres has this issue as well.

Last edited 3 years ago by Mariusz Felisiak (previous) (diff)

comment:6 Changed 3 years ago by Simon Charette

Hasan, the solution you proposed will likely address this issue but the fact multiple instances of Options for the same model classes are mixed in there is likely a symptom or a larger issue.

I'm not sure of the origin of it but this can only happen when model from different apps boundaries are mixed together which should be disallowed as it will cause all sort of weird issues. What I mean by that is that there's likely something that creates models from a migration project state and have real models reference them or vice-versa. You can assert this doesn't happen by making sure each model._meta.apps is self.apps.

In short I think this solution will only temporarily hide a much larger problem.

comment:7 Changed 3 years ago by Simon Charette

From giving a quick look at the current implementation it looks we're unfortunately sharing __fake__ models bound to different StateApps for performance reason to work around the fact the schema editors are still operating from model classes instead of model states #29898.

Since we'll likely won't be able to prevent this sharing from taking place in the short term for performance reasons we should investigate why the models cache is not appropriately busted to avoid having two different Options instances for the User model. In this case it looks like the Token fake model is not reloaded while the RRset, User, and Domain ones are when the AlterField(model_name="RRset") is performed. That makes Token.user point to the stale User model and Domain.owner point to the new one.

I highly suspect this ticket is related to #27737 and this TODO in AlterField.state_forwards.

Also, there's only a few failures when adding assertions for _meta.apps coherency once they are appropriately repointed in StateApps.clone like they should

  • django/db/migrations/state.py

    diff --git a/django/db/migrations/state.py b/django/db/migrations/state.py
    index 9b62edad1f..970197cae7 100644
    a b class StateApps(Apps): 
    322322        clone.all_models = copy.deepcopy(self.all_models)
    323323        clone.app_configs = copy.deepcopy(self.app_configs)
    324324        # Set the pointer to the correct app registry.
     325        for _, models in clone.all_models.items():
     326            for model in models.values():
     327                model._meta.apps = clone
    325328        for app_config in clone.app_configs.values():
    326329            app_config.apps = clone
    327330        # No need to actually clone them, they'll never change
  • django/db/models/options.py

    diff --git a/django/db/models/options.py b/django/db/models/options.py
    index 078b831b76..2183813a93 100644
    a b class Options: 
    704704            )
    705705            for f in fields_with_relations:
    706706                if not isinstance(f.remote_field.model, str):
    707                     related_objects_graph[str(f.remote_field.model._meta.concrete_model._meta)].append(f)
     707                    assert f.remote_field.model._meta.concrete_model._meta.apps is self.apps
     708                    related_objects_graph[f.remote_field.model._meta.concrete_model._meta].append(f)
    708709
    709710        for model in all_models:
    710711            # Set the relation_tree using the internal __dict__. In this way
    class Options: 
    712713            # __dict__ takes precedence over a data descriptor (such as
    713714            # @cached_property). This means that the _meta._relation_tree is
    714715            # only called if related_objects is not in __dict__.
    715             related_objects = related_objects_graph[str(model._meta.concrete_model._meta)]
     716            assert model._meta.concrete_model._meta.apps is self.apps
     717            related_objects = related_objects_graph[model._meta.concrete_model._meta]
    716718            model._meta.__dict__['_relation_tree'] = related_objects
    717719        # It seems it is possible that self is not in all_models, so guard
    718720        # against that with default for get().
Last edited 3 years ago by Simon Charette (previous) (diff)

comment:8 Changed 3 years ago by Mariusz Felisiak

Cc: Markus Holtermann added

comment:9 Changed 3 years ago by Mariusz Felisiak

Patch needs improvement: set

Simon, thanks for an investigation! I agree that we should try to address the main issue.

Hasan, Can you take a look?

comment:10 Changed 3 years ago by Hasan Ramezani

Sure, I can check it again.
Just for confirmation, Should I fix assertaion failures based on Simon's patch?

comment:11 Changed 3 years ago by Simon Charette

Hasan, the changes in django/db/migrations/state.py from comment:7 are likely desired but the assertion in Options were more of a way to test which operation currently break this contract. From your submitted PR only three migrations tests were failing with these assertions including the one you added even on SQLite.

Again, I suspect the cause is #27737 and the TODO in AlterField.state_forwards. The fact the rendering is delayed prevents Token from being refreshed adequately.

comment:12 Changed 3 years ago by Hasan Ramezani

Simon, I checked the TODO in AlterField.state_forwards, here is my understanding.
When I run the test in my the patch(my patch in PR + your patch in comment:7) it fails on alter operation in 0002_alter_user_id at line
assert f.remote_field.model._meta.concrete_model._meta.apps is self.apps

operations = [
        migrations.AlterField(
            model_name="user",
            name="id",
            field=models.CharField(default=uuid.uuid4, max_length=32, primary_key=True),
        ),
    ]

It fails because there is an operation in 0001_initial which run before the above mentioned migration:

        migrations.AlterField(
            model_name="RRset",
            name="subname",
            field=models.CharField(blank=False, max_length=178),
        ),

this is an alter migration and the AlterField.state_forwards reloads 3 models of 4 (User, Domain, RRest) and don't reload the Token model.
If I reload the Token model this test will be passed.

Should I change the function to find proper models to reload?
Is it the right solution?

comment:13 Changed 3 years ago by Markus Holtermann

As already suspected by Simon, this is indeed related to Django not reloading related models when non-relational fields change.

I see a few ways forward within the migration system, with various pros and cons

  1. Reload all related models, regardless if the changed field is a relational field or referenced by a relational field. This will come with a huge run-time penalty that the delaying brought
  2. Try to catch these kind of database error or do some Python-level checking beforehand (assert model1 is model2._meta.get_field("foo").related_model). If we run into any issue there, reload all related models. Depending on how expensive that is, we could possibly do that after each migration operation and make the delayed model reloading based on that.
  3. Reload all models after each migrations is applied when the ProjectState.is_delayed is False. That would bring the benefit that it doesn't matter if one applies migrations individually or in a batch.
  4. Leave as-is; the first migration can be optimized anyway

I haven't completely made up my mind about the "best" approach here. But ideally we'd get rid of model classes in migrations and solely work off of the ModelState. My last attempt got stuck do to the backwards compatibility in the SchemaEditor.

comment:14 in reply to:  12 Changed 3 years ago by Markus Holtermann

Replying to Hasan Ramezani:

this is an alter migration and the AlterField.state_forwards reloads 3 models of 4 (User, Domain, RRest) and don't reload the Token model.
If I reload the Token model this test will be passed.

Should I change the function to find proper models to reload?
Is it the right solution?

This is essentially approach 1 from my list in 13

comment:15 in reply to:  13 ; Changed 3 years ago by Peter Thomassen

Replying to Markus Holtermann:

  1. Leave as-is; the first migration can be optimized anyway

The first migration can trivially be optimized in this minimum example. In our real example, we encountered the problem where migration 0003 interferes with migration 0011, with existing deployments somewhere in the middle.

In that case, 0003 and 0011 can be optimized/squashed more or less easily for new deployments, but migration 0011 would have to be maintained individually for those deployments which already have all migrations up to that one applied. The result would be code duplication for "fresh" and "existing" deployments, and -- if this problem reoccurs -- maybe even more "squashed migration variations".

So far, squashing was optional, and it would become a mandatory mechanism, while producing code duplication in cases as described above. In my opinion, that would be the least favorable option.

Thanks a LOT for your time, and for thinking though this whole thing!

comment:16 Changed 3 years ago by Markus Holtermann

This is a minimal failing test

  • tests/migrations/test_state.py

    diff --git a/tests/migrations/test_state.py b/tests/migrations/test_state.py
    index e9bdac93c5..9140609bb6 100644
    a b class StateTests(SimpleTestCase): 
    928928        choices_field = Author._meta.get_field('choice')
    929929        self.assertEqual(list(choices_field.choices), choices)
    930930
     931    def test_reload_related_models_on_non_relational_fields(self):
     932        """
     933        #30966 - Even when fields on a model change that are not involved in
     934        a relation, the model gets reloaded. Ensure that other models pointing
     935        to or from it are reloaded accordingly.
     936
     937        User  <--  Domain  <-- RRset
     938          ^
     939          +--  Token
     940        """
     941        project_state = ProjectState()
     942        # Render project state to simulate initial migration state
     943        project_state.apps
     944        project_state.add_model(ModelState(
     945            "migrations",
     946            "User",
     947            [
     948                ("id", models.AutoField(primary_key=True)),
     949                ("email", models.EmailField(max_length=191, unique=True)),
     950            ]
     951        ))
     952        project_state.add_model(ModelState(
     953            "migrations",
     954            "Domain",
     955            [
     956                ("id", models.AutoField(primary_key=True)),
     957                ("owner", models.ForeignKey("migrations.User", models.SET_NULL)),
     958            ]
     959        ))
     960        project_state.add_model(ModelState(
     961            "migrations",
     962            "RRset",
     963            [
     964                ("id", models.AutoField(primary_key=True)),
     965                ("subname", models.CharField(blank=True, max_length=178)),
     966                ("domain", models.ForeignKey("migrations.Domain", models.SET_NULL)),
     967            ]
     968        ))
     969        project_state.add_model(ModelState(
     970            "migrations",
     971            "Token",
     972            [
     973                ("id", models.AutoField(primary_key=True)),
     974                ("user", models.ForeignKey("migrations.User", models.SET_NULL)),
     975            ]
     976        ))
     977        AlterField(
     978            model_name="RRset",
     979            name="subname",
     980            field=models.CharField(blank=False, max_length=178),
     981        ).state_forwards("migrations", project_state)
     982
     983        all_models = project_state.apps.all_models["migrations"]
     984        assert (
     985            all_models['token']._meta.get_field('user').related_model
     986            is all_models['user']
     987        ), "Models are not identical"
     988
    931989
    932990class ModelStateTests(SimpleTestCase):
    933991    def test_custom_model_base(self):

comment:17 in reply to:  15 Changed 3 years ago by Markus Holtermann

Replying to Peter Thomassen:

Replying to Markus Holtermann:

  1. Leave as-is; the first migration can be optimized anyway

The first migration can trivially be optimized in this minimum example. In our real example, we encountered the problem where migration 0003 interferes with migration 0011, with existing deployments somewhere in the middle.

In that case, 0003 and 0011 can be optimized/squashed more or less easily for new deployments, but migration 0011 would have to be maintained individually for those deployments which already have all migrations up to that one applied. The result would be code duplication for "fresh" and "existing" deployments, and -- if this problem reoccurs -- maybe even more "squashed migration variations".

So far, squashing was optional, and it would become a mandatory mechanism, while producing code duplication in cases as described above. In my opinion, that would be the least favorable option.

Yeah, that'd be true. As said, none of this is really ideal.

What we're doing here are unfortunately patches upon patches upon patches. I'd really like to move onto the "right" way.

comment:18 Changed 3 years ago by Markus Holtermann

I took a shot at implementing something along the lines of what I suggested in 2. in 13: https://github.com/MarkusH/django/pull/new/ticket30966 . At this point it does not handle unapplying of migrations.

comment:19 Changed 3 years ago by Simon Charette

Thanks Markus.

The patch will likely need a bit of polishing but a solution will likely be out best shot until we only deal with model states.

comment:20 Changed 3 years ago by Hasan Ramezani

Thanks Markus for the patch.

If there is something that can be done by me, please let me know.

comment:21 Changed 2 years ago by Simon Charette

After doing a bit of investigation work for #31064 and revisiting #29000 I think the solution proposed by Hasan makes the most sense for now and here's why.

Let's take the setup defined to test #29000 as an example

class Pony(models.Model):
    name = models.CharField(max_length=20)

class Rider(models.Model):
   pony = models.ForeignKey('test_rename_multiple.Pony', models.CASCADE)

class PonyRider(models.Model):
   riders = models.ManyToManyField('Rider')

If a RenameField('Pony', 'name', 'fancy_field) operation is applied and Pony is rendered from the new state only the Rider.pony pointer is actually stale and needs to be repointed.

However we decided not to follow the repointing route because of all the trouble it incurs. Instead we re-render only the direct relationships if delay=True (only Rider in this case) or the full relationship tree if delay=False (Rider and PonyRider in this case). A full relationship tree re-rendering is slow hence why we try to use delay=True as much as possible but when we do so references to direct relationships of the reloaded models are left pointing at pre-re-rendering Model classes which breaks Options._populate_directed_relation_graph mapping because it currently relies on Options identity as key.

To reuse our example setup above, if PonyRider -> Rider -> Pony and we reload_model('app', 'pony', delay=True) then Pony and Rider will be reloaded as Rider' -> Pony' but PonyRider will still point at Pony and not Pony' which makes any following operation relying on Pony'._meta.related_objects break because Pony'._meta is not Pony._meta.

If we look at how #29000 was addressed we can see that it works around this issue by forcing a full relationship tree re-rendering as soon as any model refers to a model on which a field is renamed. This is really wasteful but it gets the test passing in this particular case which is more a bandaid than anything else. Fundamentally the delay=True model if broken if it doesn't allow non-re-rendered models to maintain some relationship with the re-rendered graph.

For these reasons and because model identity consistency is not necessary beyond direct relationship by the current schema editor I think that adjusting Options._populate_directed_relation_graph to use Options.label as relationship key is our best way forward here as it doesn't hurt non-__fake__ model relationship resolution and it allows StateApps to keep relying on the delay=True for model rendering until schema alterations don't necessitate access to ProjectState.apps anymore.

I'll submit a PR that reverts unnecessary RenameField.state_forwards changes introduced by fcc4e251dbc917118f73d7187ee2f4cbf3883f36 and use Hasan's _populate_directed_relation_graph adjustments to demonstrate that the tests are passing. From that point, if everyone agrees that this is the right way forward, I think we should keep this ticket opened as a marker to add regression tests showing this is was properly addressed.

Thoughts on that approach Markus and Hasan?

comment:22 Changed 2 years ago by Simon Charette

Submitted a PR to demonstrate it properly addresses #29000 and passes the suite.

comment:23 Changed 2 years ago by Mariusz Felisiak <felisiak.mariusz@…>

In 1d16c5d:

Refs #27666 -- Ensured relationship consistency on delayed reloads.

Delayed reloads of state models broke identity based relationships
between direct and non-direct ancestors.

Basing models.Options related objects map of model labels instead of
their identity ensured relationship consistency is maintained.

Refs #30966.

Co-authored-by: Hasan Ramezani <hasan.r67@…>

comment:24 Changed 2 years ago by GitHub <noreply@…>

In db6933a:

Refs #30966 -- Added test for reloading related model state on non-relational changes.

Thanks Markus Holtermann for initial test.

Fixed in 1d16c5d562a67625f7309cc7765b8c57d49bf182.

comment:25 Changed 2 years ago by Mariusz Felisiak

Patch needs improvement: unset
Resolution: fixed
Status: assignedclosed
Note: See TracTickets for help on using tickets.
Back to Top