Opened 6 years ago

Closed 6 years ago

# Improve efficiency of migration graph algorithm

Reported by: Owned by: Krzysztof Gogolewski nobody Migrations dev Normal Andrew Godwin Ready for checkin yes no no no no no

## Description

It seems to me that the current migration graph code is quadratic or even exponential in some cases. I would expect computing the forward or backward plan to be linear in the size of the graph. I might write a patch but I would appreciate a sanity check first. Everything here concerns django/db/migrations/graph.py.

1. Consider this function:
```    def iterative_dfs(self, start, forwards=True):
"""Iterative depth-first search for finding dependencies."""
visited = []
stack = [start]
while stack:
node = stack.pop()
visited.append(node)
stack += sorted(node.parents if forwards else node.children)
return list(OrderedSet(reversed(visited)))
```

We are visiting a node even if it was already visited before. Example where this goes bad:

``` 1
/ \
2 3
\ /
4
/ \
5 6
\ /
7
```

Starting from 1, we'll visit node 4 two times, node 7 four times etc. - this is exponential.

Instead, we should keep a set of nodes that were already visited, and do not process already visited nodes. This will also get rid of OrderedSet that is used to remove duplicates. Here's a possible implementation:

```    def iterative_dfs(self, start, forwards=True):
visited = []
visited_set = set()
stack = [(start, False)]
while stack:
node, processed = stack.pop()
if node in visited_set:
pass
elif processed:
visited.append(node)
else:
stack.append((node, True))
stack += [(n, False) for n in sorted(node.parents if forwards else node.children)]
return visited
```

2.
`Node.ancestors()` and `Node.descendants()` compute the list of ancestors and descendants in a graph for caching - this can have O(N) items each, which means it's quadratic.

I think that once the issue in `iterative_dfs` is fixed, we can remove the recursive logic for computing DFS altogether - this means identifiers `ancestors`, `descendants`, `cached`, `clear_cache` and `RECURSION_DEPTH_WARNING`. The cache was needed to avoid exponential behavior (like the one in `iterative_dfs`), but after the fix, the algorithm will be linear and should be faster than quadratic `Node.ancestors()`.

3.
After the recursive DFS logic is removed, I think we can merge `ensure_not_cyclic` and `iterative_dfs` into a single function; it could be renamed to `topological_sort`.

### comment:1 by Tim Graham, 6 years ago

Cc: Andrew Godwin added Uncategorized → Cleanup/optimization

### comment:2 by Ramiro Morales, 6 years ago

Just FYI there is already a topological sort implementation in the DB migration code: https://github.com/django/django/blob/master/django/db/migrations/topological_sort.py

### comment:4 by Markus Holtermann, 6 years ago

There have been attempts of implementing a topological sorting of the migration dependency graph before. IIRC (it was about 3 years ago), the issue was an unstable plan that could drastically change is some apps added more migrations.

### comment:5 by Krzysztof Gogolewski, 6 years ago

I haven't combined `ensure_not_cyclic` and `iterative_dfs` in the end, as we are checking for existence of cycles in the whole graph, but are interested in a plan only for some of the nodes. It's likely we could reuse the already existing topological sort in `ensure_not_cyclic` but I'd like to merge the existing branch first.

### comment:6 by Tim Graham, 6 years ago

Has patch: set Migration graph code is inefficient → Improve efficiency of migration graph algorithm Unreviewed → Accepted

Tentatively accepting for further evaluation of the patch. If the patch isn't viable, it might be useful to add a test to show why not, if possible.

### comment:7 by Carlton Gibson, 6 years ago

Triage Stage: Accepted → Ready for checkin 2.0 → master

### comment:8 by Tim Graham <timograham@…>, 6 years ago

Resolution: → fixed new → closed

In db926a0:

Fixed #29243 -- Improved efficiency of migration graph algorithm.

Note: See TracTickets for help on using tickets.