Opened 20 months ago

Last modified 10 months ago

#26407 new Cleanup/optimization

Investigate applying transitive reduction to migration graph.

Reported by: Jarek Glowacki Owned by: nobody
Component: Migrations Version: master
Severity: Normal Keywords: transitive reduction showmigrations
Cc: Markus Holtermann, emorley@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This stemmed from a discussion between @markush and myself in https://github.com/django/django/pull/6254.

We should investigate migration loading performance for the following:

  • Requiring the migration graph to always have the minimum number of edges. This would mean enforcing transitive reduction whenever a new edge is added. This would be a trade off between the additional computation to ensure the graph is minimal versus the reduced computation in traversing the graph for ancestors/descendants/etc.
  • Performing a single transitive reduction step once the graph has been built. Again, this would be to simplify any later traversal.

Closely related to this is the showmigrations command, with the --plan flag at verbosity 2. This adds a lists of migration dependencies in brackets next to every migration in the plan. Currently each list of dependencies is constructed from the original Migration object's dependencies list, with:

  • appropriate replacements made for migrations that have been replaced by squash migrations, and
  • additional dependencies added based on any run_before declarations.

So what this shows is the list of dependencies, somewhat rawly, as set by the user in their migration files. If the user declares redundant dependencies, then these bleed into the migration graph, which propagates to the list of dependencies printed by showmigrations. The question is what do we want it to show? This raw list may help the user with debugging, but it could get cluttered if there is an abundance of superfluous dependencies. So what we could do is print dependencies that come from the migration graph after it has been transitively reduced. This would improve the brevity, though whether it would be helpful to the user is up for debate. We could also try using both reduced and unreduced migration graphs to emphasise superfluous dependencies to the user.

TLDR:
1) Investigate migration performance with transitive reduction:

  • at every edge addition
  • after graph is built

2) Decide what would be most useful to list for showmigrations --plan -v2:

  • [current] Raw dependencies (with changes to account for replacements and run_before).
  • Only critical dependencies (by using reduced graph).
  • Raw dependencies with superfluous edges somehow emphasised.

To be clear, I don't see a problem with the current system, but thought this might be worth investigating/discussing.

Change History (3)

comment:1 Changed 20 months ago by Markus Holtermann

Triage Stage: UnreviewedAccepted

comment:2 Changed 19 months ago by Markus Holtermann

The following simple implementation does not scale for more than ~ 50 to 100 nodes:

  • django/db/migrations/graph.py

    diff --git a/django/db/migrations/graph.py b/django/db/migrations/graph.py
    index 1a7a703..28ed208 100644
    a b class MigrationGraph(object): (this hunk was shorter than expected) 
    260260        """
    261261        [n.raise_error() for n in self.node_map.values() if isinstance(n, DummyNode)]
    262262
     263    def reduce(self):
     264        for xkey, xnode in six.iteritems(self.node_map):
     265            for ykey, ynode in six.iteritems(self.node_map):
     266                if xkey == ykey:
     267                    continue
     268                for zkey, znode in six.iteritems(self.node_map):
     269                    if xkey == zkey or ykey == zkey:
     270                        continue
     271                    if znode in xnode.children and ynode in xnode.children and znode in ynode.children:
     272                        xnode.children.discard(znode)
     273                        znode.parents.discard(xnode)
     274
    263275    def clear_cache(self):
    264276        if self.cached:
    265277            for node in self.nodes:
    class MigrationGraph(object): 
    278298            raise NodeNotFoundError("Node %r not a valid node" % (target, ), target)
    279299        # Use parent.key instead of parent to speed up the frequent hashing in ensure_not_cyclic
    280300        self.ensure_not_cyclic(target, lambda x: (parent.key for parent in self.node_map[x].parents))
     301        self.reduce()
    281302        self.cached = True
    282303        node = self.node_map[target]
    283304        try:
    class MigrationGraph(object): 
    298319            raise NodeNotFoundError("Node %r not a valid node" % (target, ), target)
    299320        # Use child.key instead of child to speed up the frequent hashing in ensure_not_cyclic
    300321        self.ensure_not_cyclic(target, lambda x: (child.key for child in self.node_map[x].children))
     322        self.reduce()
    301323        self.cached = True
    302324        node = self.node_map[target]
    303325        try:
Last edited 19 months ago by Markus Holtermann (previous) (diff)

comment:3 Changed 10 months ago by Ed Morley

Cc: emorley@… added
Note: See TracTickets for help on using tickets.
Back to Top