Opened 4 weeks ago
Closed 3 days ago
#36652 closed Bug (fixed)
Loading squashed migrations from disk raises CircularDependencyError indeterministically
| Reported by: | Jayden Kneller | Owned by: | Augusto Pontes |
|---|---|---|---|
| Component: | Migrations | Version: | dev |
| Severity: | Normal | Keywords: | squashed |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
This could be a duplicate of many other projects, but I've figured out a bug fix that may also solve the issues of the other projects.
I've created a github repo that contains a project with migrations, squashed migrations, and a management command. If someone checks out the repo and runs the showmigrations management command multiple times, the results will pass or fail in a way that appears random.
https://github.com/arrai-innovations/django-migrationnames
I've tracked the issue down to the order that a set is iterated over. If that set is turned into a list, without any other code changes, the issue shown in this test project stops failing and instead passes 100% of the time. I can't guarrantee it will fix every other users issues with circular dependency errors, but it will stop the randomness that is happening because of iterating through a set.
If someone uses the management command I created, called show_issue, it will create output with a number of files that contain results parsed from the patched MigrationLoader. The data comes from loader.graph.node_map, where the migrations and their children are stored. In essence, there are 8 different possibilities that the migrations and children can be figured out as, because of the set. 2 of these 8 possibilities pass and the other 6 cause circular dependency errors. The fix changes things to have 1 possibility for the way migrations and children are figured out, and it passes.
The randomness of iterating over the set and having 8 possible migration/children setups is what makes things difficult to deal with when working with squashed migrations.
I've written about the management command and how to use it in the README.md of the django-migrationnames project.
The fix is really small, which I have in a fork. Since it is very small, this is the change:
https://github.com/jayden-arrai/django/commit/fceaa5820edfbd7128082e7d199034963ae6cd33
Change History (35)
comment:1 by , 4 weeks ago
comment:2 by , 4 weeks ago
I don't know if this needs tests, or how anyone would make a test for it. Unless the project migrations I have can somehow be converted into a test. The randomness of it makes it difficult.
follow-up: 29 comment:3 by , 4 weeks ago
Hi jayden, your repo seems to be a great idea to solve this problem of the migrations, im still reading your readme, and definetely this will need automated tests, also documentation
comment:4 by , 4 weeks ago
I should also mention that if the old migrations are removed, so that only the squashed migrations exist, then it doesn't matter if the migration names are a set or list. The issue where this is a problem seems to only be at the point where both old migrations and the squashed migrations exist. Basically when you are about to deploy the squashed migrations for the first time and deploy it to a site, so it adds them into the django_migrations table.
comment:5 by , 4 weeks ago
If it is possible to use this repo in the creation of the automated tests, then please do. It was a lot of work to rename everything to obscure the original project. I renamed all the migration names in the process as well, because of some of their names. I'm glad the operations in the old migrations don't need to exist in order to have this issue occur, that removed a lot of code. I'm not sure if it would be possible to trim down the number of apps and migrations and still have a situation like this occur. At the very least, I think I got the code that exists down to about the smallest amount that I could.
But, I'm not sure how you would make an automated test for this. What would need to be tested? Because the set seems to iterate in the same order within the running process no matter how many times you iterate over it, and doesn't always fail. That must make things more difficult to test. I was forced to add the shuffle in the patched MigrationLoader due to the same process iteration issue, which I wasn't expecting.
comment:6 by , 4 weeks ago
| Keywords: | squashed added |
|---|---|
| Needs tests: | set |
| Summary: | Django Squashed Migrations Randmonly Passing/Failing Based On Set Iteration Order → Loading squashed migrations from disk raises CircularDependencyError indeterministically |
| Triage Stage: | Unreviewed → Accepted |
Thanks, the report makes sense, and I could reproduce with your sample command. Appreciate the care you put into the report.
Yes, a test to demonstrate the issue is required, using the bare minimum number of migrations. It should be possible to isolate the cycle and recreate it with just two or three apps.
We will want a test for LoaderTests, and a small set of migrations in tests/migrations/test_migrations_squashed_.... An example test for migrations loading is test_loading_squashed_complex_multi_apps. Once you have that, on the PR we can discuss ways to avoid patching the command to shuffle. (At a glance, I think we can either mock pkgutil.iter_modules to reverse the result, or if we want to avoid coupling our test to the implementation, just create a second set of migration files that differ only in naming, as long as this is not flaky.)
Would you like to push this over the line? If so, please set yourself in the owner field. Thanks, and happy to help.
comment:7 by , 4 weeks ago
| Has patch: | unset |
|---|
(We usually wait for a pull request to mark Has Patch, unsetting.)
comment:8 by , 4 weeks ago
| Needs tests: | unset |
|---|
comment:9 by , 3 weeks ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
follow-up: 12 comment:10 by , 3 weeks ago
I will create another branch for testing, and i suggest you to create a .gitignore, because the show_issue command generate too many files and cache
comment:11 by , 3 weeks ago
Using a management command and permutations of the migration names, I can see if there is a way to recreate this with less apps and migrations. I'll test between removing apps and migrations and see happens.
comment:12 by , 3 weeks ago
Replying to Augusto Pontes:
I will create another branch for testing, and i suggest you to create a .gitignore, because the
show_issuecommand generate too many files and cache
Done
comment:13 by , 3 weeks ago
If anyone needs write access to this repo, let me know. I don't have any branch protection rules set up, and I'm not sure I'll need any for this.
When I get the management command created that can run permutations of migrations names, I'll get that committed.
comment:14 by , 3 weeks ago
Looks like permutations of (app_label, migration_name) is not going to be useful, as it is far too high a number. I'll see what I can figure out.
comment:15 by , 3 weeks ago
So, it look like i missed commenting out 2 dependencies in the old apps, and if I comment them out, then the issue doesn't happen. So those are probably the key to making this smaller.
comment:16 by , 3 weeks ago
I updated the project, trimming the migrations down to 6 per app. I left 1 dependency in an old migration, commented out one I missed, and now I have 2 success and 2 failure situations. From the looks of this, since it is related to a dependency in an old migration, it should be possibly to trim down the apps and migrations, possibly a lot.
follow-up: 18 comment:17 by , 3 weeks ago
So far I've deleted apps 2, 3, 5, 7, 8, 10, and 14. Also found that contrib wasn't being used. It was listed as a MIGRATION_MODULES in the app this came from.
comment:18 by , 3 weeks ago
Replying to Jayden Kneller:
So far I've deleted apps 2, 3, 5, 7, 8, 10, and 14. Also found that contrib wasn't being used. It was listed as a MIGRATION_MODULES in the app this came from.
Ok, im writing some tests to make sure that your methods are really working, but thanks to notify me, i will update my branch
comment:19 by , 3 weeks ago
Got it down to 3 apps. 2 with 2 migrations and 1 with 3. I'm going to rename them to app_one and app_two. users will stay.
comment:21 by , 3 weeks ago
After I've renamed things I'll rewrite the Readme, to reflect the current state of things.
comment:22 by , 3 weeks ago
Hi Jayden, glad to hear of progress reducing the example. Could I ask you to reduce the frequency of the updates? Feel free to provide an update when you're ready for a review or are blocked, but if you're trying to collaborate in real time with Augusto, may I suggest the Django Discord or other collaboration platform instead? Thanks.
follow-up: 25 comment:23 by , 3 weeks ago
Ok jacob, jayden you can add me on discord:
rankracerbr
just type this name, if you cant send the invite, i will invite you to my server
comment:24 by , 3 weeks ago
So, after getting the test project down to 3 apps and 7 total migrations, I've tracked the issue down to the order that self.graph.remove_replaced_nodes(key, migration.replaces) gets called, which is determined by the order of self.replacements.items(), which is determined by the order of migration_names. If the second migration in app_one is added to self.replacements before the first migration, since dictionaries are now ordered by default, then a cyclic dependency error occurs 100% of the time. To confirm this you can rename migration 0002_squashed_initial to 0000_squashed_initial and call show_issue --use-fixed.
If it is possible to mock the results that self.replacements.items() returns, then we could have a test that guarantees the results of either a pass or CircularDependencyError based on the order of the migrations we return, but with the migration names being a set right now, that mock would still be susceptible to it only giving the result we want 50% of the time. So, how do we write a test that will fail if the migration names is a set?
Or, perhaps a better fix than making the migration names into a list, would be for self.replacements to be added to in the order that migrations would run. But, I'm not sure how easy that would be.
Any ideas for how to do a test?
follow-up: 26 comment:25 by , 3 weeks ago
Replying to Augusto Pontes:
Ok jacob, jayden you can add me on discord:
rankracerbr
just type this name, if you cant send the invite, i will invite you to my server
Searching for rankracerbr says no one with that name could be found. You can join the django discord channel using https://discord.com/invite/xcRH6mN4fa
comment:26 by , 3 weeks ago
Replying to Jayden Kneller:
Replying to Augusto Pontes:
Ok jacob, jayden you can add me on discord:
rankracerbr
just type this name, if you cant send the invite, i will invite you to my server
Searching for rankracerbr says no one with that name could be found. You can join the django discord channel using https://discord.com/invite/xcRH6mN4fa
Im in the server, i sent a message at the Community > #Contributor-discussion, if you find me, you can send a friend request
I|IRankracerBRI|I: ticket#36652
follow-up: 28 comment:27 by , 3 weeks ago
If i've understood correctly, the order can be fixed with PYTHONHASHSEED env variable. Because Python iterates over set using the location of each items, and it is based on the hash value. Python also uses seed to randomize the hash results.
So, it's possible to test using PYTHONHASHSEED and subprocess. So we just need to find the value that causes CircularDependencyError.
Here is simple example.
class SomeTestClass(TestCase):
def test():
...
#####
# other file
#####
def some_test_method():
env_vars = os.environ.copy()
env_vars['PYTHONHASHSEED'] = '0'
result = subprocess.run(
[sys.executable, '-m', 'unittest', 'test_file.py'],
env=env_vars,
capture_output=True,
text=True
)
I'm not sure if the result will be same, given a fixed PYTHONHASHSEED env variable.
If something wrong here, please tell me.
And sry for my bad english:(
Actually It's my first time on open source:)
comment:28 by , 3 weeks ago
Replying to 허진수:
If i've understood correctly, the order can be fixed with PYTHONHASHSEED env variable. Because Python iterates over set using the location of each items, and it is based on the hash value. Python also uses seed to randomize the hash results.
So, it's possible to test using
PYTHONHASHSEEDandsubprocess. So we just need to find the value that causes CircularDependencyError.
Here is simple example.
class SomeTestClass(TestCase): def test(): ... ##### # other file ##### def some_test_method(): env_vars = os.environ.copy() env_vars['PYTHONHASHSEED'] = '0' result = subprocess.run( [sys.executable, '-m', 'unittest', 'test_file.py'], env=env_vars, capture_output=True, text=True )I'm not sure if the result will be same, given a fixed
PYTHONHASHSEEDenv variable.
If something wrong here, please tell me.
And sry for my bad english:(
Actually It's my first time on open source:)
Great suggestion, i recommend you to analyze the README on the repo of jayden, i read, and the problem is a little bit deeper, on what i understood, is that jayden already solved the problem, with some adjustments on the code, and for now we are doing unit tests, and he is analyzing performance issues that can happen in the future or on other big applications, we are discussing on the official discord server, you can join using this link:
https://discord.com/channels/856567261900832808/1428096388608299155
comment:29 by , 3 weeks ago
| Has patch: | set |
|---|
comment:30 by , 3 weeks ago
Got a PR created. I even added a small docs change, though I don't know if it is needed.
2 of the tests I created (test_migration_loader_replacement_order_succeeds and test_migration_loader_replacement_order_raises) are more for if someone makes changes deeper in the code, if someone can figure out a better fix. If there is a fix deeper in the code, then the first 2 tests will become useless.
comment:31 by , 8 days ago
| Patch needs improvement: | set |
|---|
comment:32 by , 4 days ago
| Patch needs improvement: | unset |
|---|---|
| Triage Stage: | Accepted → Ready for checkin |
comment:33 by , 4 days ago
| Patch needs improvement: | set |
|---|---|
| Triage Stage: | Ready for checkin → Accepted |
Test failures need investigating.
comment:34 by , 3 days ago
| Patch needs improvement: | unset |
|---|---|
| Triage Stage: | Accepted → Ready for checkin |
If anyone needs help understanding the documentation about the management command, or the files that it creates, feel free to ask.