Opened 5 years ago
Closed 5 years ago
#30966 closed Bug (fixed)
Migration crashes due to foreign key issue, depending on otherwise irrelevant order on MySQL.
Reported by: | Peter Thomassen | Owned by: | Hasan Ramezani |
---|---|---|---|
Component: | Migrations | Version: | dev |
Severity: | Normal | Keywords: | migration mySQL |
Cc: | Simon Charette, Markus Holtermann | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
In my Django app "desecapi" (reduced to a minimal example), I have two migrations. The first migration defines the following models:
User
+ Token (Token.user relates to User.id)
+ Domain (Domain.owner relates to User.id)
...+ RRset (RRset.domain relates to Domain.id)
... where "+" denotes foreign-key relationships. Note that RRset is a grandchild of User.
When running the two migrations together on an empty database (i.e. running python manage.py desecapi 0002
instead of ... 0001
and ... 0002
separately), the second migration fails with:
django.db.utils.OperationalError: (1833, "Cannot change column 'id': used in a foreign key constraint 'desecapi_token_user_id_bfc397b3_fk_desecapi_user_id' of table 'desec.desecapi_token'")
When running them separately, both migrations work. I would expect that running in bulk or each on its own should not make a difference.
The issue can be fixed by switching the order of two operations in the first migration. One of these two operations is the one that creates the desecapi_token
table (mentioned in the above error), and the other operation changes an attribute of the RRset.subname field. That model is not related to the Token model, and also not directly related to the User model (which is touch in the second migration).
I would not expect this ordering to influence whether the second migration can run successfully or not.
(One can say that the operation changing RRset.subname is unnecessary, as it could be incorporated in RRset's "CreateModel" operation. However, the history of encountering the problem is that RRset.subname in fact was changed in an intermediate migration. It just seems unnecessary because this minimal example is very much reduced.)
The minimal example can be found here: https://github.com/peterthomassen/django-bug/tree/master/api/desecapi
Note that in the interest of having a minimal example, models.py does not contain all models that the migrations deal with. My understanding is that that's fine (think of the missing models and fields as removed, but there is no migration yet that reflects that).
I created a docker-compose application that sets up an empty database and reproduces the problem. To reproduce, run the following:
git clone https://github.com/peterthomassen/django-bug.git
cd django-bug/
docker-compose build
docker-compose up -d dbapi # after this, wait for 2-3 minutes so that the database is up
docker-compose run api bash
# now, at the container shell, compare
python manage.py migrate desecapi 0002
# vs (make sure to clean up the database)
python manage.py migrate desecapi 0001
python manage.py migrate desecapi 0002
Attachments (3)
Change History (28)
by , 5 years ago
by , 5 years ago
Attachment: | 0001_initial.py added |
---|
by , 5 years ago
Attachment: | 0002_alter_user_id.py added |
---|
comment:1 by , 5 years ago
Summary: | Migration crashes due to index issue, depending on otherwise irrelevant order → Migration crashes due to foreign key issue, depending on otherwise irrelevant order |
---|
comment:2 by , 5 years ago
Component: | Database layer (models, ORM) → Migrations |
---|---|
Keywords: | mySQL added |
Summary: | Migration crashes due to foreign key issue, depending on otherwise irrelevant order → Migration crashes due to foreign key issue, depending on otherwise irrelevant order on MySQL. |
Triage Stage: | Unreviewed → Accepted |
Version: | 2.2 → master |
comment:3 by , 5 years ago
Cc: | added |
---|
comment:4 by , 5 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
I investigate it and think the problem comes from this line:
related_objects_graph[f.remote_field.model._meta.concrete_model._meta].append(f)
of django.db.models.options.Options._populate_directed_relation_graph():
I added some log to check related_objects_graph
in bothe case.
When the migration runs by ./manage.py migrate desecapi 0001
and then ./manage.py migrate desecapi 0002
(correct case), here is the result:
defaultdict(<class 'list'>, {<Options for User>: [<django.db.models.fields.related.ForeignKey: owner>, <django.db.models.fields.related.ForeignKey: user>], <Options for Domain>: [<django.db.models.fields.related.ForeignKey: domain>]})
But, When the migration runs by ./manage.py migrate desecapi 0002
(case with problem), here is the result:
defaultdict(<class 'list'>, {<Options for User>: [<django.db.models.fields.related.ForeignKey: user>], <Options for User>: [<django.db.models.fields.related.ForeignKey: owner>], <Options for Domain>: [<django.db.models.fields.related.ForeignKey: domain>]})
From the result, it is obvious that in the first case we have the <Options for User>
key with a value of type list with two items
[<django.db.models.fields.related.ForeignKey: owner>, <django.db.models.fields.related.ForeignKey: user>]
.
which generate two alter commands:
ALTER TABLE `app1_domain` DROP FOREIGN KEY `app1_domain_owner_id_cc7262a2_fk_app1_user_id` ALTER TABLE `app1_token` DROP FOREIGN KEY `app1_token_user_id_5ea51092_fk_app1_user_id`
But in the second case, we have two keys <Options for User>
which one of them has a value of type list with one item
[<django.db.models.fields.related.ForeignKey: user>]
and the other has a value of type list with one item
[<django.db.models.fields.related.ForeignKey: owner>]
.
which generate just one alter command:
ALTER TABLE `app1_domain` DROP FOREIGN KEY `app1_domain_owner_id_cc7262a2_fk_app1_user_id`
Thus, the second foreign key app1_token_user_id_5ea51092_fk_app1_user_id
still remains and cause the error.
Any suggestions for a fix?
comment:5 by , 5 years ago
Has patch: | set |
---|
Postgres has this issue as well.
comment:6 by , 5 years ago
Hasan, the solution you proposed will likely address this issue but the fact multiple instances of Options
for the same model classes are mixed in there is likely a symptom or a larger issue.
I'm not sure of the origin of it but this can only happen when model from different apps
boundaries are mixed together which should be disallowed as it will cause all sort of weird issues. What I mean by that is that there's likely something that creates models from a migration project state and have real models reference them or vice-versa. You can assert this doesn't happen by making sure each model._meta.apps is self.apps
.
In short I think this solution will only temporarily hide a much larger problem.
comment:7 by , 5 years ago
From giving a quick look at the current implementation it looks we're unfortunately sharing __fake__
models bound to different StateApps
for performance reason to work around the fact the schema editors are still operating from model classes instead of model states #29898.
Since we'll likely won't be able to prevent this sharing from taking place in the short term for performance reasons we should investigate why the models cache is not appropriately busted to avoid having two different Options
instances for the User
model. In this case it looks like the Token
fake model is not reloaded while the RRset
, User
, and Domain
ones are when the AlterField(model_name="RRset")
is performed. That makes Token.user
point to the stale User
model and Domain.owner
point to the new one.
I highly suspect this ticket is related to #27737 and this TODO in AlterField.state_forwards
.
Also, there's only a few failures when adding assertions for _meta.apps
coherency once they are appropriately repointed in StateApps.clone
like they should
-
django/db/migrations/state.py
diff --git a/django/db/migrations/state.py b/django/db/migrations/state.py index 9b62edad1f..970197cae7 100644
a b class StateApps(Apps): 322 322 clone.all_models = copy.deepcopy(self.all_models) 323 323 clone.app_configs = copy.deepcopy(self.app_configs) 324 324 # Set the pointer to the correct app registry. 325 for _, models in clone.all_models.items(): 326 for model in models.values(): 327 model._meta.apps = clone 325 328 for app_config in clone.app_configs.values(): 326 329 app_config.apps = clone 327 330 # No need to actually clone them, they'll never change -
django/db/models/options.py
diff --git a/django/db/models/options.py b/django/db/models/options.py index 078b831b76..2183813a93 100644
a b class Options: 704 704 ) 705 705 for f in fields_with_relations: 706 706 if not isinstance(f.remote_field.model, str): 707 related_objects_graph[str(f.remote_field.model._meta.concrete_model._meta)].append(f) 707 assert f.remote_field.model._meta.concrete_model._meta.apps is self.apps 708 related_objects_graph[f.remote_field.model._meta.concrete_model._meta].append(f) 708 709 709 710 for model in all_models: 710 711 # Set the relation_tree using the internal __dict__. In this way … … class Options: 712 713 # __dict__ takes precedence over a data descriptor (such as 713 714 # @cached_property). This means that the _meta._relation_tree is 714 715 # only called if related_objects is not in __dict__. 715 related_objects = related_objects_graph[str(model._meta.concrete_model._meta)] 716 assert model._meta.concrete_model._meta.apps is self.apps 717 related_objects = related_objects_graph[model._meta.concrete_model._meta] 716 718 model._meta.__dict__['_relation_tree'] = related_objects 717 719 # It seems it is possible that self is not in all_models, so guard 718 720 # against that with default for get().
comment:8 by , 5 years ago
Cc: | added |
---|
comment:9 by , 5 years ago
Patch needs improvement: | set |
---|
Simon, thanks for an investigation! I agree that we should try to address the main issue.
Hasan, Can you take a look?
comment:10 by , 5 years ago
Sure, I can check it again.
Just for confirmation, Should I fix assertaion failures based on Simon's patch?
comment:11 by , 5 years ago
Hasan, the changes in django/db/migrations/state.py
from comment:7 are likely desired but the assertion in Options
were more of a way to test which operation currently break this contract. From your submitted PR only three migrations
tests were failing with these assertions including the one you added even on SQLite.
Again, I suspect the cause is #27737 and the TODO
in AlterField.state_forwards
. The fact the rendering is delayed prevents Token
from being refreshed adequately.
follow-up: 14 comment:12 by , 5 years ago
Simon, I checked the TODO
in AlterField.state_forwards
, here is my understanding.
When I run the test in my the patch(my patch in PR + your patch in comment:7) it fails on alter operation in 0002_alter_user_id
at line
assert f.remote_field.model._meta.concrete_model._meta.apps is self.apps
operations = [ migrations.AlterField( model_name="user", name="id", field=models.CharField(default=uuid.uuid4, max_length=32, primary_key=True), ), ]
It fails because there is an operation in 0001_initial
which run before the above mentioned migration:
migrations.AlterField( model_name="RRset", name="subname", field=models.CharField(blank=False, max_length=178), ),
this is an alter migration and the AlterField.state_forwards
reloads 3 models of 4 (User, Domain, RRest) and don't reload the Token model.
If I reload the Token model this test will be passed.
Should I change the function to find proper models to reload?
Is it the right solution?
follow-up: 15 comment:13 by , 5 years ago
As already suspected by Simon, this is indeed related to Django not reloading related models when non-relational fields change.
I see a few ways forward within the migration system, with various pros and cons
- Reload all related models, regardless if the changed field is a relational field or referenced by a relational field. This will come with a huge run-time penalty that the delaying brought
- Try to catch these kind of database error or do some Python-level checking beforehand (
assert model1 is model2._meta.get_field("foo").related_model
). If we run into any issue there, reload all related models. Depending on how expensive that is, we could possibly do that after each migration operation and make the delayed model reloading based on that. - Reload all models after each migrations is applied when the
ProjectState.is_delayed is False
. That would bring the benefit that it doesn't matter if one applies migrations individually or in a batch. - Leave as-is; the first migration can be optimized anyway
I haven't completely made up my mind about the "best" approach here. But ideally we'd get rid of model classes in migrations and solely work off of the ModelState
. My last attempt got stuck do to the backwards compatibility in the SchemaEditor
.
comment:14 by , 5 years ago
Replying to Hasan Ramezani:
this is an alter migration and the
AlterField.state_forwards
reloads 3 models of 4 (User, Domain, RRest) and don't reload the Token model.
If I reload the Token model this test will be passed.
Should I change the function to find proper models to reload?
Is it the right solution?
This is essentially approach 1 from my list in 13
follow-up: 17 comment:15 by , 5 years ago
Replying to Markus Holtermann:
- Leave as-is; the first migration can be optimized anyway
The first migration can trivially be optimized in this minimum example. In our real example, we encountered the problem where migration 0003 interferes with migration 0011, with existing deployments somewhere in the middle.
In that case, 0003 and 0011 can be optimized/squashed more or less easily for new deployments, but migration 0011 would have to be maintained individually for those deployments which already have all migrations up to that one applied. The result would be code duplication for "fresh" and "existing" deployments, and -- if this problem reoccurs -- maybe even more "squashed migration variations".
So far, squashing was optional, and it would become a mandatory mechanism, while producing code duplication in cases as described above. In my opinion, that would be the least favorable option.
Thanks a LOT for your time, and for thinking though this whole thing!
comment:16 by , 5 years ago
This is a minimal failing test
-
tests/migrations/test_state.py
diff --git a/tests/migrations/test_state.py b/tests/migrations/test_state.py index e9bdac93c5..9140609bb6 100644
a b class StateTests(SimpleTestCase): 928 928 choices_field = Author._meta.get_field('choice') 929 929 self.assertEqual(list(choices_field.choices), choices) 930 930 931 def test_reload_related_models_on_non_relational_fields(self): 932 """ 933 #30966 - Even when fields on a model change that are not involved in 934 a relation, the model gets reloaded. Ensure that other models pointing 935 to or from it are reloaded accordingly. 936 937 User <-- Domain <-- RRset 938 ^ 939 +-- Token 940 """ 941 project_state = ProjectState() 942 # Render project state to simulate initial migration state 943 project_state.apps 944 project_state.add_model(ModelState( 945 "migrations", 946 "User", 947 [ 948 ("id", models.AutoField(primary_key=True)), 949 ("email", models.EmailField(max_length=191, unique=True)), 950 ] 951 )) 952 project_state.add_model(ModelState( 953 "migrations", 954 "Domain", 955 [ 956 ("id", models.AutoField(primary_key=True)), 957 ("owner", models.ForeignKey("migrations.User", models.SET_NULL)), 958 ] 959 )) 960 project_state.add_model(ModelState( 961 "migrations", 962 "RRset", 963 [ 964 ("id", models.AutoField(primary_key=True)), 965 ("subname", models.CharField(blank=True, max_length=178)), 966 ("domain", models.ForeignKey("migrations.Domain", models.SET_NULL)), 967 ] 968 )) 969 project_state.add_model(ModelState( 970 "migrations", 971 "Token", 972 [ 973 ("id", models.AutoField(primary_key=True)), 974 ("user", models.ForeignKey("migrations.User", models.SET_NULL)), 975 ] 976 )) 977 AlterField( 978 model_name="RRset", 979 name="subname", 980 field=models.CharField(blank=False, max_length=178), 981 ).state_forwards("migrations", project_state) 982 983 all_models = project_state.apps.all_models["migrations"] 984 assert ( 985 all_models['token']._meta.get_field('user').related_model 986 is all_models['user'] 987 ), "Models are not identical" 988 931 989 932 990 class ModelStateTests(SimpleTestCase): 933 991 def test_custom_model_base(self):
comment:17 by , 5 years ago
Replying to Peter Thomassen:
Replying to Markus Holtermann:
- Leave as-is; the first migration can be optimized anyway
The first migration can trivially be optimized in this minimum example. In our real example, we encountered the problem where migration 0003 interferes with migration 0011, with existing deployments somewhere in the middle.
In that case, 0003 and 0011 can be optimized/squashed more or less easily for new deployments, but migration 0011 would have to be maintained individually for those deployments which already have all migrations up to that one applied. The result would be code duplication for "fresh" and "existing" deployments, and -- if this problem reoccurs -- maybe even more "squashed migration variations".
So far, squashing was optional, and it would become a mandatory mechanism, while producing code duplication in cases as described above. In my opinion, that would be the least favorable option.
Yeah, that'd be true. As said, none of this is really ideal.
What we're doing here are unfortunately patches upon patches upon patches. I'd really like to move onto the "right" way.
comment:18 by , 5 years ago
I took a shot at implementing something along the lines of what I suggested in 2. in 13: https://github.com/MarkusH/django/pull/new/ticket30966 . At this point it does not handle unapplying of migrations.
comment:19 by , 5 years ago
Thanks Markus.
The patch will likely need a bit of polishing but a solution will likely be out best shot until we only deal with model states.
comment:20 by , 5 years ago
Thanks Markus for the patch.
If there is something that can be done by me, please let me know.
comment:21 by , 5 years ago
After doing a bit of investigation work for #31064 and revisiting #29000 I think the solution proposed by Hasan makes the most sense for now and here's why.
Let's take the setup defined to test #29000 as an example
class Pony(models.Model): name = models.CharField(max_length=20) class Rider(models.Model): pony = models.ForeignKey('test_rename_multiple.Pony', models.CASCADE) class PonyRider(models.Model): riders = models.ManyToManyField('Rider')
If a RenameField('Pony', 'name', 'fancy_field)
operation is applied and Pony
is rendered from the new state only the Rider.pony
pointer is actually stale and needs to be repointed.
However we decided not to follow the repointing route because of all the trouble it incurs. Instead we re-render only the direct relationships if delay=True
(only Rider
in this case) or the full relationship tree if delay=False
(Rider
and PonyRider
in this case). A full relationship tree re-rendering is slow hence why we try to use delay=True
as much as possible but when we do so references to direct relationships of the reloaded models are left pointing at pre-re-rendering Model
classes which breaks Options._populate_directed_relation_graph
mapping because it currently relies on Options
identity as key.
To reuse our example setup above, if PonyRider -> Rider -> Pony
and we reload_model('app', 'pony', delay=True)
then Pony
and Rider
will be reloaded as Rider' -> Pony'
but PonyRider
will still point at Pony
and not Pony'
which makes any following operation relying on Pony'._meta.related_objects
break because Pony'._meta is not Pony._meta
.
If we look at how #29000 was addressed we can see that it works around this issue by forcing a full relationship tree re-rendering as soon as any model refers to a model on which a field is renamed. This is really wasteful but it gets the test passing in this particular case which is more a bandaid than anything else. Fundamentally the delay=True
model if broken if it doesn't allow non-re-rendered models to maintain some relationship with the re-rendered graph.
For these reasons and because model identity consistency is not necessary beyond direct relationship by the current schema editor I think that adjusting Options._populate_directed_relation_graph
to use Options.label
as relationship key is our best way forward here as it doesn't hurt non-__fake__
model relationship resolution and it allows StateApps
to keep relying on the delay=True
for model rendering until schema alterations don't necessitate access to ProjectState.apps
anymore.
I'll submit a PR that reverts unnecessary RenameField.state_forwards
changes introduced by fcc4e251dbc917118f73d7187ee2f4cbf3883f36 and use Hasan's _populate_directed_relation_graph
adjustments to demonstrate that the tests are passing. From that point, if everyone agrees that this is the right way forward, I think we should keep this ticket opened as a marker to add regression tests showing this is was properly addressed.
Thoughts on that approach Markus and Hasan?
comment:22 by , 5 years ago
comment:25 by , 5 years ago
Patch needs improvement: | unset |
---|---|
Resolution: | → fixed |
Status: | assigned → closed |
Fixed in 1d16c5d562a67625f7309cc7765b8c57d49bf182.
Thanks for this report, I confirmed this issue on MySQL. Scenario is really tricky, without the last operation i.e.
everything works fine. When we take it into account, migrations don't drop a
FOREIGN KEY
constraint fromdesecapi_token
.Reproduced at 85efc14a2edac532df1a9ad4dd9b6d4a4dcf583e.