Opened 4 years ago

Last modified 4 years ago

#26407 new Cleanup/optimization

Investigate applying transitive reduction to migration graph.

Reported by: Jarek Glowacki Owned by: nobody
Component: Migrations Version: master
Severity: Normal Keywords: transitive reduction showmigrations
Cc: Markus Holtermann, emorley@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This stemmed from a discussion between @markush and myself in https://github.com/django/django/pull/6254.

We should investigate migration loading performance for the following:

  • Requiring the migration graph to always have the minimum number of edges. This would mean enforcing transitive reduction whenever a new edge is added. This would be a trade off between the additional computation to ensure the graph is minimal versus the reduced computation in traversing the graph for ancestors/descendants/etc.
  • Performing a single transitive reduction step once the graph has been built. Again, this would be to simplify any later traversal.

Closely related to this is the showmigrations command, with the --plan flag at verbosity 2. This adds a lists of migration dependencies in brackets next to every migration in the plan. Currently each list of dependencies is constructed from the original Migration object's dependencies list, with:

  • appropriate replacements made for migrations that have been replaced by squash migrations, and
  • additional dependencies added based on any run_before declarations.

So what this shows is the list of dependencies, somewhat rawly, as set by the user in their migration files. If the user declares redundant dependencies, then these bleed into the migration graph, which propagates to the list of dependencies printed by showmigrations. The question is what do we want it to show? This raw list may help the user with debugging, but it could get cluttered if there is an abundance of superfluous dependencies. So what we could do is print dependencies that come from the migration graph after it has been transitively reduced. This would improve the brevity, though whether it would be helpful to the user is up for debate. We could also try using both reduced and unreduced migration graphs to emphasise superfluous dependencies to the user.

TLDR:
1) Investigate migration performance with transitive reduction:

  • at every edge addition
  • after graph is built

2) Decide what would be most useful to list for showmigrations --plan -v2:

  • [current] Raw dependencies (with changes to account for replacements and run_before).
  • Only critical dependencies (by using reduced graph).
  • Raw dependencies with superfluous edges somehow emphasised.

To be clear, I don't see a problem with the current system, but thought this might be worth investigating/discussing.

Change History (3)

comment:1 Changed 4 years ago by Markus Holtermann

Triage Stage: UnreviewedAccepted

comment:2 Changed 4 years ago by Markus Holtermann

The following simple implementation does not scale for more than ~ 50 to 100 nodes:

  • django/db/migrations/graph.py

    diff --git a/django/db/migrations/graph.py b/django/db/migrations/graph.py
    index 1a7a703..28ed208 100644
    a b class MigrationGraph(object): 
    260260        """
    261261        [n.raise_error() for n in self.node_map.values() if isinstance(n, DummyNode)]
    262262
     263    def reduce(self):
     264        for xkey, xnode in six.iteritems(self.node_map):
     265            # print('xkey: %s' % (xkey,))
     266            for ykey, ynode in six.iteritems(self.node_map):
     267                # print('ykey: %s' % (ykey,))
     268                if xkey == ykey:
     269                    continue
     270                for zkey, znode in six.iteritems(self.node_map):
     271                    # print('zkey: %s' % (zkey,))
     272                    if xkey == zkey or ykey == zkey:
     273                        continue
     274                    if znode in xnode.children and ynode in xnode.children and znode in ynode.children:
     275                        print("rm %(xkey)s ->  %(zkey)s : %(xkey)s ->  %(ykey)s ->  %(zkey)s" % {
     276                            'xkey': xkey,
     277                            'ykey': ykey,
     278                            'zkey': zkey,
     279                        })
     280                        xnode.children.discard(znode)
     281                        znode.parents.discard(xnode)
     282
    263283    def clear_cache(self):
    264284        if self.cached:
    265285            for node in self.nodes:
    class MigrationGraph(object): 
    278298            raise NodeNotFoundError("Node %r not a valid node" % (target, ), target)
    279299        # Use parent.key instead of parent to speed up the frequent hashing in ensure_not_cyclic
    280300        self.ensure_not_cyclic(target, lambda x: (parent.key for parent in self.node_map[x].parents))
     301        self.reduce()
    281302        self.cached = True
    282303        node = self.node_map[target]
    283304        try:
    class MigrationGraph(object): 
    298319            raise NodeNotFoundError("Node %r not a valid node" % (target, ), target)
    299320        # Use child.key instead of child to speed up the frequent hashing in ensure_not_cyclic
    300321        self.ensure_not_cyclic(target, lambda x: (child.key for child in self.node_map[x].children))
     322        self.reduce()
    301323        self.cached = True
    302324        node = self.node_map[target]
    303325        try:
Version 0, edited 4 years ago by Markus Holtermann (next)

comment:3 Changed 4 years ago by Ed Morley

Cc: emorley@… added
Note: See TracTickets for help on using tickets.
Back to Top