Opened 4 weeks ago

Closed 3 days ago

#36652 closed Bug (fixed)

Loading squashed migrations from disk raises CircularDependencyError indeterministically

Reported by: Jayden Kneller Owned by: Augusto Pontes
Component: Migrations Version: dev
Severity: Normal Keywords: squashed
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This could be a duplicate of many other projects, but I've figured out a bug fix that may also solve the issues of the other projects.

I've created a github repo that contains a project with migrations, squashed migrations, and a management command. If someone checks out the repo and runs the showmigrations management command multiple times, the results will pass or fail in a way that appears random.

https://github.com/arrai-innovations/django-migrationnames

I've tracked the issue down to the order that a set is iterated over. If that set is turned into a list, without any other code changes, the issue shown in this test project stops failing and instead passes 100% of the time. I can't guarrantee it will fix every other users issues with circular dependency errors, but it will stop the randomness that is happening because of iterating through a set.

If someone uses the management command I created, called show_issue, it will create output with a number of files that contain results parsed from the patched MigrationLoader. The data comes from loader.graph.node_map, where the migrations and their children are stored. In essence, there are 8 different possibilities that the migrations and children can be figured out as, because of the set. 2 of these 8 possibilities pass and the other 6 cause circular dependency errors. The fix changes things to have 1 possibility for the way migrations and children are figured out, and it passes.

The randomness of iterating over the set and having 8 possible migration/children setups is what makes things difficult to deal with when working with squashed migrations.

I've written about the management command and how to use it in the README.md of the django-migrationnames project.

The fix is really small, which I have in a fork. Since it is very small, this is the change:

https://github.com/jayden-arrai/django/commit/fceaa5820edfbd7128082e7d199034963ae6cd33

Change History (35)

comment:1 by Jayden Kneller, 4 weeks ago

If anyone needs help understanding the documentation about the management command, or the files that it creates, feel free to ask.

comment:2 by Jayden Kneller, 4 weeks ago

I don't know if this needs tests, or how anyone would make a test for it. Unless the project migrations I have can somehow be converted into a test. The randomness of it makes it difficult.

comment:3 by Augusto Pontes, 4 weeks ago

Hi jayden, your repo seems to be a great idea to solve this problem of the migrations, im still reading your readme, and definetely this will need automated tests, also documentation

comment:4 by Jayden Kneller, 4 weeks ago

I should also mention that if the old migrations are removed, so that only the squashed migrations exist, then it doesn't matter if the migration names are a set or list. The issue where this is a problem seems to only be at the point where both old migrations and the squashed migrations exist. Basically when you are about to deploy the squashed migrations for the first time and deploy it to a site, so it adds them into the django_migrations table.

Last edited 4 weeks ago by Jayden Kneller (previous) (diff)

comment:5 by Jayden Kneller, 4 weeks ago

If it is possible to use this repo in the creation of the automated tests, then please do. It was a lot of work to rename everything to obscure the original project. I renamed all the migration names in the process as well, because of some of their names. I'm glad the operations in the old migrations don't need to exist in order to have this issue occur, that removed a lot of code. I'm not sure if it would be possible to trim down the number of apps and migrations and still have a situation like this occur. At the very least, I think I got the code that exists down to about the smallest amount that I could.

But, I'm not sure how you would make an automated test for this. What would need to be tested? Because the set seems to iterate in the same order within the running process no matter how many times you iterate over it, and doesn't always fail. That must make things more difficult to test. I was forced to add the shuffle in the patched MigrationLoader due to the same process iteration issue, which I wasn't expecting.

comment:6 by Jacob Walls, 4 weeks ago

Keywords: squashed added
Needs tests: set
Summary: Django Squashed Migrations Randmonly Passing/Failing Based On Set Iteration OrderLoading squashed migrations from disk raises CircularDependencyError indeterministically
Triage Stage: UnreviewedAccepted

Thanks, the report makes sense, and I could reproduce with your sample command. Appreciate the care you put into the report.

Yes, a test to demonstrate the issue is required, using the bare minimum number of migrations. It should be possible to isolate the cycle and recreate it with just two or three apps.

We will want a test for LoaderTests, and a small set of migrations in tests/migrations/test_migrations_squashed_.... An example test for migrations loading is test_loading_squashed_complex_multi_apps. Once you have that, on the PR we can discuss ways to avoid patching the command to shuffle. (At a glance, I think we can either mock pkgutil.iter_modules to reverse the result, or if we want to avoid coupling our test to the implementation, just create a second set of migration files that differ only in naming, as long as this is not flaky.)

Would you like to push this over the line? If so, please set yourself in the owner field. Thanks, and happy to help.

comment:7 by Jacob Walls, 4 weeks ago

Has patch: unset

(We usually wait for a pull request to mark Has Patch, unsetting.)

comment:8 by Jacob Walls, 4 weeks ago

Needs tests: unset

comment:9 by Augusto Pontes, 3 weeks ago

Owner: set to Augusto Pontes
Status: newassigned

comment:10 by Augusto Pontes, 3 weeks ago

I will create another branch for testing, and i suggest you to create a .gitignore, because the show_issue command generate too many files and cache

comment:11 by Jayden Kneller, 3 weeks ago

Using a management command and permutations of the migration names, I can see if there is a way to recreate this with less apps and migrations. I'll test between removing apps and migrations and see happens.

in reply to:  10 comment:12 by Jayden Kneller, 3 weeks ago

Replying to Augusto Pontes:

I will create another branch for testing, and i suggest you to create a .gitignore, because the show_issue command generate too many files and cache

Done

comment:13 by Jayden Kneller, 3 weeks ago

If anyone needs write access to this repo, let me know. I don't have any branch protection rules set up, and I'm not sure I'll need any for this.

When I get the management command created that can run permutations of migrations names, I'll get that committed.

comment:14 by Jayden Kneller, 3 weeks ago

Looks like permutations of (app_label, migration_name) is not going to be useful, as it is far too high a number. I'll see what I can figure out.

comment:15 by Jayden Kneller, 3 weeks ago

So, it look like i missed commenting out 2 dependencies in the old apps, and if I comment them out, then the issue doesn't happen. So those are probably the key to making this smaller.

comment:16 by Jayden Kneller, 3 weeks ago

I updated the project, trimming the migrations down to 6 per app. I left 1 dependency in an old migration, commented out one I missed, and now I have 2 success and 2 failure situations. From the looks of this, since it is related to a dependency in an old migration, it should be possibly to trim down the apps and migrations, possibly a lot.

comment:17 by Jayden Kneller, 3 weeks ago

So far I've deleted apps 2, 3, 5, 7, 8, 10, and 14. Also found that contrib wasn't being used. It was listed as a MIGRATION_MODULES in the app this came from.

in reply to:  17 comment:18 by Augusto Pontes, 3 weeks ago

Replying to Jayden Kneller:

So far I've deleted apps 2, 3, 5, 7, 8, 10, and 14. Also found that contrib wasn't being used. It was listed as a MIGRATION_MODULES in the app this came from.

Ok, im writing some tests to make sure that your methods are really working, but thanks to notify me, i will update my branch

comment:19 by Jayden Kneller, 3 weeks ago

Got it down to 3 apps. 2 with 2 migrations and 1 with 3. I'm going to rename them to app_one and app_two. users will stay.

comment:20 by Jayden Kneller, 3 weeks ago

There is now only 1 successful path and 1 failure path.

comment:21 by Jayden Kneller, 3 weeks ago

After I've renamed things I'll rewrite the Readme, to reflect the current state of things.

comment:22 by Jacob Walls, 3 weeks ago

Hi Jayden, glad to hear of progress reducing the example. Could I ask you to reduce the frequency of the updates? Feel free to provide an update when you're ready for a review or are blocked, but if you're trying to collaborate in real time with Augusto, may I suggest the Django Discord or other collaboration platform instead? Thanks.

comment:23 by Augusto Pontes, 3 weeks ago

Ok jacob, jayden you can add me on discord:

rankracerbr

just type this name, if you cant send the invite, i will invite you to my server

comment:24 by Jayden Kneller, 3 weeks ago

So, after getting the test project down to 3 apps and 7 total migrations, I've tracked the issue down to the order that self.graph.remove_replaced_nodes(key, migration.replaces) gets called, which is determined by the order of self.replacements.items(), which is determined by the order of migration_names. If the second migration in app_one is added to self.replacements before the first migration, since dictionaries are now ordered by default, then a cyclic dependency error occurs 100% of the time. To confirm this you can rename migration 0002_squashed_initial to 0000_squashed_initial and call show_issue --use-fixed.

If it is possible to mock the results that self.replacements.items() returns, then we could have a test that guarantees the results of either a pass or CircularDependencyError based on the order of the migrations we return, but with the migration names being a set right now, that mock would still be susceptible to it only giving the result we want 50% of the time. So, how do we write a test that will fail if the migration names is a set?

Or, perhaps a better fix than making the migration names into a list, would be for self.replacements to be added to in the order that migrations would run. But, I'm not sure how easy that would be.

Any ideas for how to do a test?

Last edited 3 weeks ago by Jayden Kneller (previous) (diff)

in reply to:  23 ; comment:25 by Jayden Kneller, 3 weeks ago

Replying to Augusto Pontes:

Ok jacob, jayden you can add me on discord:

rankracerbr

just type this name, if you cant send the invite, i will invite you to my server

Searching for rankracerbr says no one with that name could be found. You can join the django discord channel using https://discord.com/invite/xcRH6mN4fa

in reply to:  25 comment:26 by Augusto Pontes, 3 weeks ago

Replying to Jayden Kneller:

Replying to Augusto Pontes:

Ok jacob, jayden you can add me on discord:

rankracerbr

just type this name, if you cant send the invite, i will invite you to my server

Searching for rankracerbr says no one with that name could be found. You can join the django discord channel using https://discord.com/invite/xcRH6mN4fa

Im in the server, i sent a message at the Community > #Contributor-discussion, if you find me, you can send a friend request

I|IRankracerBRI|I: ticket#36652

comment:27 by 허진수, 3 weeks ago

If i've understood correctly, the order can be fixed with PYTHONHASHSEED env variable. Because Python iterates over set using the location of each items, and it is based on the hash value. Python also uses seed to randomize the hash results.

So, it's possible to test using PYTHONHASHSEED and subprocess. So we just need to find the value that causes CircularDependencyError.
Here is simple example.

class SomeTestClass(TestCase):
    def test():
        ...

#####
# other file
#####
def some_test_method():
    env_vars = os.environ.copy()
    env_vars['PYTHONHASHSEED'] = '0'
    result = subprocess.run(
        [sys.executable, '-m', 'unittest', 'test_file.py'],
        env=env_vars,
        capture_output=True,
        text=True
    )

I'm not sure if the result will be same, given a fixed PYTHONHASHSEED env variable.
If something wrong here, please tell me.
And sry for my bad english:(
Actually It's my first time on open source:)

Last edited 3 weeks ago by 허진수 (previous) (diff)

in reply to:  27 comment:28 by Augusto Pontes, 3 weeks ago

Replying to 허진수:

If i've understood correctly, the order can be fixed with PYTHONHASHSEED env variable. Because Python iterates over set using the location of each items, and it is based on the hash value. Python also uses seed to randomize the hash results.

So, it's possible to test using PYTHONHASHSEED and subprocess. So we just need to find the value that causes CircularDependencyError.
Here is simple example.

class SomeTestClass(TestCase):
    def test():
        ...

#####
# other file
#####
def some_test_method():
    env_vars = os.environ.copy()
    env_vars['PYTHONHASHSEED'] = '0'
    result = subprocess.run(
        [sys.executable, '-m', 'unittest', 'test_file.py'],
        env=env_vars,
        capture_output=True,
        text=True
    )

I'm not sure if the result will be same, given a fixed PYTHONHASHSEED env variable.
If something wrong here, please tell me.
And sry for my bad english:(
Actually It's my first time on open source:)

Great suggestion, i recommend you to analyze the README on the repo of jayden, i read, and the problem is a little bit deeper, on what i understood, is that jayden already solved the problem, with some adjustments on the code, and for now we are doing unit tests, and he is analyzing performance issues that can happen in the future or on other big applications, we are discussing on the official discord server, you can join using this link:

https://discord.com/channels/856567261900832808/1428096388608299155

in reply to:  3 comment:29 by Jayden Kneller, 3 weeks ago

Has patch: set

comment:30 by Jayden Kneller, 3 weeks ago

Got a PR created. I even added a small docs change, though I don't know if it is needed.

2 of the tests I created (test_migration_loader_replacement_order_succeeds and test_migration_loader_replacement_order_raises) are more for if someone makes changes deeper in the code, if someone can figure out a better fix. If there is a fix deeper in the code, then the first 2 tests will become useless.

comment:31 by Jacob Walls, 8 days ago

Patch needs improvement: set

comment:32 by Jacob Walls, 4 days ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:33 by Jacob Walls, 4 days ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Test failures need investigating.

comment:34 by Jacob Walls, 3 days ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:35 by Jacob Walls <jacobtylerwalls@…>, 3 days ago

Resolution: fixed
Status: assignedclosed

In e27cff6:

Fixed #36652 -- Increased determinism when loading migrations from disk.

Ordering still depends on pkgutil.iter_modules, which does not guarantee
order, but at least now Django is not introducing additional indeterminism,
causing CircularDependencyError to appear or not appear in some edge cases.

Co-authored-by: Jacob Walls <jacobtylerwalls@…>

Note: See TracTickets for help on using tickets.
Back to Top