Opened 6 years ago

Closed 6 years ago

## #29243 closed Cleanup/optimization (fixed)

# Improve efficiency of migration graph algorithm

Reported by: | Krzysztof Gogolewski | Owned by: | nobody |
---|---|---|---|

Component: | Migrations | Version: | dev |

Severity: | Normal | Keywords: | |

Cc: | Andrew Godwin | Triage Stage: | Ready for checkin |

Has patch: | yes | Needs documentation: | no |

Needs tests: | no | Patch needs improvement: | no |

Easy pickings: | no | UI/UX: | no |

## Description

It seems to me that the current migration graph code is quadratic or even exponential in some cases. I would expect computing the forward or backward plan to be linear in the size of the graph. I might write a patch but I would appreciate a sanity check first. Everything here concerns django/db/migrations/graph.py.

- Consider this function:

def iterative_dfs(self, start, forwards=True): """Iterative depth-first search for finding dependencies.""" visited = [] stack = [start] while stack: node = stack.pop() visited.append(node) stack += sorted(node.parents if forwards else node.children) return list(OrderedSet(reversed(visited)))

We are visiting a node even if it was already visited before. Example where this goes bad:

1 / \ 2 3 \ / 4 / \ 5 6 \ / 7

Starting from 1, we'll visit node 4 two times, node 7 four times etc. - this is exponential.

Instead, we should keep a set of nodes that were already visited, and do not process already visited nodes. This will also get rid of OrderedSet that is used to remove duplicates. Here's a possible implementation:

def iterative_dfs(self, start, forwards=True): visited = [] visited_set = set() stack = [(start, False)] while stack: node, processed = stack.pop() if node in visited_set: pass elif processed: visited_set.add(node) visited.append(node) else: stack.append((node, True)) stack += [(n, False) for n in sorted(node.parents if forwards else node.children)] return visited

2.

`Node.ancestors()`

and `Node.descendants()`

compute the list of ancestors and descendants in a graph for caching - this can have O(N) items each, which means it's quadratic.

I think that once the issue in `iterative_dfs`

is fixed, we can remove the recursive logic for computing DFS altogether - this means identifiers `ancestors`

, `descendants`

, `cached`

, `clear_cache`

and `RECURSION_DEPTH_WARNING`

. The cache was needed to avoid exponential behavior (like the one in `iterative_dfs`

), but after the fix, the algorithm will be linear and should be faster than quadratic `Node.ancestors()`

.

3.

After the recursive DFS logic is removed, I think we can merge `ensure_not_cyclic`

and `iterative_dfs`

into a single function; it could be renamed to `topological_sort`

.

### Change History (8)

### comment:1 by , 6 years ago

Cc: | added |
---|---|

Type: | Uncategorized → Cleanup/optimization |

### comment:2 by , 6 years ago

### comment:4 by , 6 years ago

There have been attempts of implementing a topological sorting of the migration dependency graph before. IIRC (it was about 3 years ago), the issue was an unstable plan that could drastically change is some apps added more migrations.

### comment:5 by , 6 years ago

I haven't combined `ensure_not_cyclic`

and `iterative_dfs`

in the end, as we are checking for existence of cycles in the whole graph, but are interested in a plan only for some of the nodes. It's likely we could reuse the already existing topological sort in `ensure_not_cyclic`

but I'd like to merge the existing branch first.

### comment:6 by , 6 years ago

Has patch: | set |
---|---|

Summary: | Migration graph code is inefficient → Improve efficiency of migration graph algorithm |

Triage Stage: | Unreviewed → Accepted |

Tentatively accepting for further evaluation of the patch. If the patch isn't viable, it might be useful to add a test to show why not, if possible.

### comment:7 by , 6 years ago

Triage Stage: | Accepted → Ready for checkin |
---|---|

Version: | 2.0 → master |

**Note:**See TracTickets for help on using tickets.

Just FYI there is already a topological sort implementation in the DB migration code: https://github.com/django/django/blob/master/django/db/migrations/topological_sort.py