Opened 9 months ago

Last modified 6 months ago

#27845 assigned Cleanup/optimization

Possible Migration Optimizer Strategy Improvement

Reported by: Raphael Gaschignard Owned by: Simon Charette
Component: Migrations Version: master
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

The migration optimizer works by taking pairs of operations and trying to reduce them by fusing or eliminating the operations. For example, turning an AddModel and an AddField into a single AddModel with the extra field.

The current optimization strategy if you have [A,B,C,D,E] is:

  • try and reduce A and B
  • try and reduce A and C (with B in between)
  • try and reduce A and D (with B and C in between) etc. if (for example) D refers to A and cannot reduce, then we break out of the loop (because even if A and E could reduce, D refers to A).

But sometimes, D referring to A doesn't mean that A and E cannot be reduced! Simply that they cannot be reduced in a way that removes A. This reference issue prevents a lot of straightforward optimizations. This is a proposal for a new optimizer strategy that would help to unlock some of this potential.

In a new strategy, we would add a notion of preceding and depending.

  • B depending on A means that A must be before B in the chain of operations, because B must happen after A.

For example: AddField(m, f) is dependent on CreateModel(m) because m needs to be created before a field is added to it

  • B preceding C means that C must follow B in the chain of operations, because (ultimately) C depends on B.

For example, CreateModel(m) precedes AddField(m', ForeignKey(m)) because, in order to create a foreign key to m, we need m to exist!

In this new strategy, we would have two optimization passes. Firstly, we would "backwards optimize". This involves taking operations like [A, B, C, D], and attempting to reduce A and C by bringing C towards A (resulting in [A+C, B, D]) .

If you have [A, B, ..., Y, Z], you can backwards optimize Z into A if no operation in [B, ..., Y] precedes Z (that is to say, Z does not depend on anything in [B, ..., Y]). In this first pass, references to A are not important, because A cannot be removed.

Example of allowed reductions in the backwards optimization:

  • CreateModel + AddField of the same model
  • AddField + Alter Field for the same field

Example of a reduction that would not be used in backwards optimization:

  • CreateModel + RemoveModel (because both operations would be removed)

The second pass would be a "forwards optimization" step. In this step, we're looking to take [A, B, C, D] and bring elements forward (for example to [B, C, A+D].

In this optimization pass, we can forward optimize A and Z in [A,B,C...,Y,Z] if no operation in [B,C,...,Y] depends on A. This is a bit closer to the current optimization strategy (which checks if [B,C, ... ,Y] reference A). This could include all existing reductions, including those that remove operations entirely.

Because the second optimization pass works like the old strategy, this new strategy mainly consists in adding the first step, and whitelisting reduction operations. The optimizer currently has a "references" check that can be also be re-used as a dependency check.

I'm not sure if there are any holes in this optimizing strategy, but running through it in some real world examples seems to hold up.

Change History (3)

comment:1 Changed 9 months ago by Simon Charette

Cc: Simon Charette added
Triage Stage: UnreviewedAccepted

See this PR for references_(model|field) adjustments.

comment:2 Changed 9 months ago by Simon Charette

Has patch: set
Owner: changed from nobody to Simon Charette
Status: newassigned

comment:3 Changed 6 months ago by Tim Graham

Patch needs improvement: set

As far as I can tell, the PR is awaiting an update from Simon.

Note: See TracTickets for help on using tickets.
Back to Top