Opened 8 years ago

Closed 8 years ago

Last modified 4 years ago

#26064 closed Cleanup/optimization (fixed)

Move migration operation reduction logic to their respective class.

Reported by: Simon Charette Owned by: Simon Charette
Component: Migrations Version: dev
Severity: Normal Keywords:
Cc: Shai Berger, Loic Bistuer Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In order to avoid redundant operations during makemigrations and to support migration squashing a reduction algorithm combines and elides a linear collection of operations.

In the case of makemigrations the algorithm is run on each generated Migration.operations individually and in the case of squashmigration <app_label> it is the ordered combination of all Migration.operations of the specified application.

The actual reduction algorithm hard codes all possible optimizations which makes it harder to unit test and extend (see #24109).

From my understanding these reduction methods were originally hardcoded in the Optimizer class for performance reason. However, based on benchmarking using a migration generation script provided by Markus, it looks like this might be a case of premature optimization as I couldn't get a noticeable slowdown on both makemigrations (with over 1000 operations generated in a single migration) and squashmigrations (with over 100 migrations with around 10 operations each).

Therefore I suggest to move operation reduction to their respective class.

Note that the Operation.reduce method could even be documented in the future to expose a public API for third party application shipping with custom Operation providers.

If it was made public this method could also be used to solve the data migration squashing problem described by Shai on the mailing list by allowing specific reduction to be performed.

For example, given the following operation:

class CreateInitialCountries(migrations.RunPython):
    def __init__(model_name, name_field):
        self.model_name = model_name
        self.name_field = name_field
        super().__init__(self.forwards)

    def forwards(self, apps, schema_editor):
        country_model = apps.get(self.model_name)
        for name in self.get_initial_countries_from_csv():
            country_model.objects.create(**{self.name_field: name})

    def reduce(self, operation, in_between):
        if (isinstance(operation, RenameModel) and
                self.model_name == operation.old_model_name):
            return [
                operation, CreateInitialCountries(
                    operation.new_model_name, self.name_field
                )
            ]
        elif (isinstance(operation, RenameField) and
              self.model_name == other.model_name and
              self.field_name == other.old_name):
            return [
                operation, CreateInitialCountries(
                    self.model_name, other.new_name
                )
            ]
        elif (isinstance(operation, DeleteModel) and
              self.model_name == operation.model_name):
            return [operation]

These list of operations would me reduced/optimized as follow:

operations = [
    CreateModel(
        'Country',
        fields=[
            ('id', models.AutoField(primary_key=True)),
            ('name', models.CharField(max_length=50)),
        ]
    ),
    CreateInitialCountries('Country', 'name'),
    RenameField('Country', 'name', 'renamed_name'),
    RenameModel('Country', 'RenamedCountry'),
]
assert optimize(operations) == [
    CreateModel(
        'RenamedCountry',
        fields=[
            ('id', models.AutoField(primary_key=True)),
            ('renamed', models.CharField(max_length=50)),
        ]
    ),
    CreateInitialCountries('RenamedCountry', 'renamed'),
]

operations = [
    CreateModel(
        'Country',
        fields=[
            ('id', models.AutoField(primary_key=True)),
            ('name', models.CharField(max_length=50)),
        ]
    ),
    CreateInitialCountries('Country', 'name'),
    DeleteModel('Country'),
]
assert optimize(operations) == []

Change History (10)

comment:1 by Tim Graham, 8 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Simon Charette, 8 years ago

Owner: changed from nobody to Simon Charette
Status: newassigned

comment:3 by Shai Berger, 8 years ago

That looks very interesting, but has some issues that need working out.

The first is -- it seems obvious that the operation needs to be defined out of the migration files. But intuitively, this is an operation that is "only serving one migration" -- the idea that it would serve in a different migration after a squashing operation will not be on anyone's mind when it is written. As things are presented, I suspect it would be hard to explain why you'd want to write the code of your data migration in an operation class rather than a migration file.

The second (a problem we already have today, but perhaps this gives us an opportunity to solve it) is interaction between user operations. If we go this way, users will want to be able to handle these -- consider the migration which adds the population to each country; our user will want to replace the two (creation and add-population) by one operation, which creates the countries with their population. But for that to make sense (with the admonition not to change migrations which have already been applied), the reduction rules cannot be defined on the earlier migration -- they need to be defined either on the later one, or outside of both.

More generally -- I've seen data-creation migrations which have been much more elaborate, creating complex structures with relations between models etc. I suspect we won't be able to ask users to write such migrations with all model and field names passed in as arguments and used via reflection. Still, it may be possible to get from a "normal" RunPython function to one closer in spirit to your CreateInitialCountries operation by more-or-less automatic means (AST transformations).

So, there's a lot of promise here, but also a lot of work and thinking to do.

comment:4 by Shai Berger, 8 years ago

Cc: Shai Berger added

comment:5 by Loic Bistuer, 8 years ago

Cc: Loic Bistuer added

comment:6 by Carl Meyer, 8 years ago

Hmm, this is neat, and if there isn't a performance hit I think it's probably worth implementing just to see what people do with the improved flexibility, but I am skeptical about just how useful this will be in practice for the use case described in the ticket. To put my skepticism in brief: is it actually easier to write that reduce method in advance, anticipating all the various schema alterations that might surround your operation and how you'd return an adjusted version of your operation to account for them, than it is to just take care of it manually when squashing? Even if you're using "squashmigrations" rather than the "full reset" approach, "taking care of it manually" is just a matter of removing the data migration, adjusting its successor to point to its parent, then squashing, then re-appending (and modifying as necessary) the data migration. Squashing is done infrequently enough that I have a hard time imagining when I'd actually choose to write and test a fully parametrized operation with a custom reduce method instead of just doing that manual dance every now and then. Maybe I just don't use initial-data migrations enough? Or don't squash frequently enough?

comment:7 by Simon Charette, 8 years ago

To put my skepticism in brief: is it actually easier to write that reduce method in advance, anticipating all the various schema alterations that might surround your operation and how you'd return an adjusted version of your operation to account for them, than it is to just take care of it manually when squashing? Even if you're using "squashmigrations" rather than the "full reset" approach, "taking care of it manually" is just a matter of removing the data migration, adjusting its successor to point to its parent, then squashing, then re-appending (and modifying as necessary) the data migration. Squashing is done infrequently enough that I have a hard time imagining when I'd actually choose to write and test a fully parametrized operation with a custom reduce method instead of just doing that manual dance every now and then. Maybe I just don't use initial-data migrations enough? Or don't squash frequently enough?

Thanks for your feedback Carl. I agree that for most experienced developers among us this won't be a game changer. As you said we can easily figure out where the non-elidable migration has to be moved or how it should be adjusted.

However I think it can make a significant difference for third-party applications and large projects where non experienced developers still have to squash their migrations on their own once there is too many of them. Such an addition could benefit such users as more experienced developers could make the whole process a no brainer, the operation would take care of itself.

For example, the HStoreExtension and HStoreField case would be a good candidate for such an addition.

comment:8 by Tim Graham, 8 years ago

Triage Stage: AcceptedReady for checkin

PR looks good to me.

comment:9 by Simon Charette <charette.s@…>, 8 years ago

Resolution: fixed
Status: assignedclosed

In 49f4c9f4:

Fixed #26064 -- Moved operation reduction logic to their own class.

Thanks to Markus Holtermann and Tim Graham for their review.

comment:10 by GitHub <noreply@…>, 4 years ago

In daaa8949:

Refs #26064 -- Avoided unnecessary list slicing in migration optimizer.

The in_between list is only necessary if an optimization is possible.

Note: See TracTickets for help on using tickets.
Back to Top