id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
29243	Improve efficiency of migration graph algorithm	Krzysztof Gogolewski	nobody	"It seems to me that the current migration graph code is quadratic or even exponential in some cases. I would expect computing the forward or backward plan to be linear in the size of the graph. I might write a patch but I would appreciate a sanity check first. Everything here concerns django/db/migrations/graph.py.

1.  Consider this function:

{{{
    def iterative_dfs(self, start, forwards=True):
        """"""Iterative depth-first search for finding dependencies.""""""
        visited = []
        stack = [start]
        while stack:
            node = stack.pop()
            visited.append(node)
            stack += sorted(node.parents if forwards else node.children)
        return list(OrderedSet(reversed(visited)))
}}}

We are visiting a node even if it was already visited before. Example where this goes bad:

{{{
 1
/ \
2 3
\ /
 4
/ \
5 6
\ /
 7
}}}

Starting from 1, we'll visit node 4 two times, node 7 four times etc. - this is exponential.

Instead, we should keep a set of nodes that were already visited, and do not process already visited nodes. This will also get rid of OrderedSet that is used to remove duplicates. Here's a possible implementation:

{{{
    def iterative_dfs(self, start, forwards=True):
        visited = []
        visited_set = set()
        stack = [(start, False)]
        while stack:
             node, processed = stack.pop()
             if node in visited_set:
                 pass
             elif processed:
                 visited_set.add(node)
                 visited.append(node)
             else:
                 stack.append((node, True))
                 stack += [(n, False) for n in sorted(node.parents if forwards else node.children)]
        return visited
}}}

2.
`Node.ancestors()` and `Node.descendants()` compute the list of ancestors and descendants in a graph for caching - this can have O(N) items each, which means it's quadratic.

I think that once the issue in `iterative_dfs` is fixed, we can remove the recursive logic for computing DFS altogether - this means identifiers `ancestors`, `descendants`, `cached`, `clear_cache` and `RECURSION_DEPTH_WARNING`. The cache was needed to avoid exponential behavior (like the one in `iterative_dfs`), but after the fix, the algorithm will be linear and should be faster than quadratic `Node.ancestors()`.

3.
After the recursive DFS logic is removed, I think we can merge `ensure_not_cyclic` and `iterative_dfs` into a single function; it could be renamed to `topological_sort`."	Cleanup/optimization	closed	Migrations	dev	Normal	fixed		Andrew Godwin	Ready for checkin	1	0	0	0	0	0