Context Navigation

← Previous Ticket
Next Ticket →

#29243 closed Cleanup/optimization (fixed)

Improve efficiency of migration graph algorithm

Reported by:	Krzysztof Gogolewski	Owned by:	nobody
Component:	Migrations	Version:	dev
Severity:	Normal	Keywords:
Cc:	Andrew Godwin	Triage Stage:	Ready for checkin
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

It seems to me that the current migration graph code is quadratic or even exponential in some cases. I would expect computing the forward or backward plan to be linear in the size of the graph. I might write a patch but I would appreciate a sanity check first. Everything here concerns django/db/migrations/graph.py.

Consider this function:

    def iterative_dfs(self, start, forwards=True):
        """Iterative depth-first search for finding dependencies."""
        visited = []
        stack = [start]
        while stack:
            node = stack.pop()
            visited.append(node)
            stack += sorted(node.parents if forwards else node.children)
        return list(OrderedSet(reversed(visited)))

We are visiting a node even if it was already visited before. Example where this goes bad:

 1
/ \
2 3
\ /
 4
/ \
5 6
\ /
 7

Starting from 1, we'll visit node 4 two times, node 7 four times etc. - this is exponential.

Instead, we should keep a set of nodes that were already visited, and do not process already visited nodes. This will also get rid of OrderedSet that is used to remove duplicates. Here's a possible implementation:

    def iterative_dfs(self, start, forwards=True):
        visited = []
        visited_set = set()
        stack = [(start, False)]
        while stack:
             node, processed = stack.pop()
             if node in visited_set:
                 pass
             elif processed:
                 visited_set.add(node)
                 visited.append(node)
             else:
                 stack.append((node, True))
                 stack += [(n, False) for n in sorted(node.parents if forwards else node.children)]
        return visited

2.
Node.ancestors() and Node.descendants() compute the list of ancestors and descendants in a graph for caching - this can have O(N) items each, which means it's quadratic.

I think that once the issue in iterative_dfs is fixed, we can remove the recursive logic for computing DFS altogether - this means identifiers ancestors, descendants, cached, clear_cache and RECURSION_DEPTH_WARNING. The cache was needed to avoid exponential behavior (like the one in iterative_dfs), but after the fix, the algorithm will be linear and should be faster than quadratic Node.ancestors().

3.
After the recursive DFS logic is removed, I think we can merge ensure_not_cyclic and iterative_dfs into a single function; it could be renamed to topological_sort.

Change History (8)

comment:1 by Tim Graham, 7 years ago

Cc:	Andrew Godwin added
Type:	Uncategorized → Cleanup/optimization

comment:2 by Ramiro Morales, 7 years ago

Just FYI there is already a topological sort implementation in the DB migration code: https://github.com/django/django/blob/master/django/db/migrations/topological_sort.py

comment:3 by Krzysztof Gogolewski, 7 years ago

PR: https://github.com/django/django/pull/9804

comment:4 by Markus Holtermann, 7 years ago

There have been attempts of implementing a topological sorting of the migration dependency graph before. IIRC (it was about 3 years ago), the issue was an unstable plan that could drastically change is some apps added more migrations.

comment:5 by Krzysztof Gogolewski, 7 years ago

I haven't combined ensure_not_cyclic and iterative_dfs in the end, as we are checking for existence of cycles in the whole graph, but are interested in a plan only for some of the nodes. It's likely we could reuse the already existing topological sort in ensure_not_cyclic but I'd like to merge the existing branch first.

comment:6 by Tim Graham, 7 years ago

Has patch:	set
Summary:	Migration graph code is inefficient → Improve efficiency of migration graph algorithm
Triage Stage:	Unreviewed → Accepted

Tentatively accepting for further evaluation of the patch. If the patch isn't viable, it might be useful to add a test to show why not, if possible.

comment:7 by Carlton Gibson, 6 years ago

Triage Stage:	Accepted → Ready for checkin
Version:	2.0 → master

comment:8 by Tim Graham <timograham@…>, 6 years ago

Resolution:	→ fixed
Status:	new → closed

In db926a0:

Fixed #29243 -- Improved efficiency of migration graph algorithm.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues