Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#23861 closed Bug (fixed)

Fields referenced in migrations cannot be (re)moved, even if migrations have been squashed

Reported by: Luke Plant Owned by: Tim Graham
Component: Migrations Version: dev
Severity: Release blocker Keywords:
Cc: info+coding@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently IPAddressField is set to be removed in Django 1.9.

The problem is that all migrations that use IPAddressField will then break, and this means the entire stack of migrations will break (because they are all imported even when not executed, and the migration file references django.db.models.IPAddressField at import time). This also means that test suites will break, since migrations are run to set up the DB.

To be clear, this will happen even if you have migrated away from IPAddressField - the old versions of the schema still reference IPAddressField, and your old migrations will therefore break everything (current migrations, test suite) until they are re-written.

Re-writing old migrations is itself a nasty fix, and not something we want to encourage.

This will be especially bad for 3rd party apps who can't just throw away/squash old migrations. It would be really nasty for users of those libraries if they had to squash all migrations in order to support Django 1.9.

So, I think we need a different strategy for deprecating fields. We need to keep a stub IPAddressField around - basically forever.

Also, the deprecation warning for these old fields can make test suite output really noisy if you have warnings turned on (which can be a very good way to find code that needs fixing). Ideally we'd find some way of only outputting the warning when the field was *really* being used - not just instantiated due to an old migration being run.

Change History (19)

comment:1 by Claude Paroz, 9 years ago

It's true that with migrations, we are now setting in stone some old code/symbols, and we have to take this new paradigm into account.
I'm not happy at all about keeping stubs/old code forever in Django.

Some crazy idea without any proof of concept: Add a new django.legacy module which contains old code and which dynamically injects compatibility shims into Django current modules only when migrations are run.

comment:2 by Carl Meyer, 9 years ago

I think that in fact, squashing old migrations does solve this problem, and that's the solution we should recommend. Even third-party apps can squash migrations and then later remove them without any problem. They just need to get started on that process with enough lead time so that any of their users who upgrade to Django 1.9 have already gotten through the squashed migration set and upgraded to the version of the third-party app that removes the old migrations.

comment:3 by Luke Plant, 9 years ago

Component: UncategorizedMigrations

@carljm this solution doesn't work, unfortunately, at least not with the current implementation. (I've tested this with an actual project).

If you squash the migrations, then you end up with a new migration that doesn't access models.IPAddressField. But the old migrations still access models.IPAddressField, and when you run ./manage.py migrate both the old and the new migrations are imported, so you still get an AttributeError from the replaced migrations.

This could possibly be fixed by importing files that look like squashed migrations first (which can be guessed from the file name), and importing the ones they replace only once it has been established that this is necessary. However, guessing squashed migrations from the file name is fragile, and will break when you get to the stage of squashing migrations another time (i.e. squashing migrations that were themselves squashed). And we will get to the that stage, especially if squashing migrations is the recommended way of dealing with the problem of removed fields.

A possible solution to these further problems is that if a migration is a squashed migration (has the replaces attribute), we raise an error if it 'looks like' a squashed migration from its file name, and vice versa. We'd need to change the docs, too, to say that when you transition a squashed migration to a normal migration, you must remove the 'squashed' bit from the file name: https://docs.djangoproject.com/en/dev/topics/migrations/#squashing-migrations

In this way, we can know for sure that we can find the squashed migrations first, and not import the ones they replace.

I'm still uncomfortable with forcing this on people. Migrations are hard enough without forcing people to squash things simply to keep our code base tidy. I actually prefer the 'crazy' idea of dynamically importing compatibility shims when migrations are running. However, we also need a solution for 3rd party libraries - i.e. 3rd parties who have model fields that are referenced by other projects, which they then deprecate and remove, causing the same breakage that removing IPAddressField will.

We need a documented solution that will allow us and other Django libraries to do the right thing relatively easily, and know immediately if they have done the wrong thing. Releasing broken migrations which then get applied for some people is a really nasty situation to be in. The only nice thing about the current situation is that if you remove a field, all migrations will break straight away and you won't be able to run any of them.

I'm guessing Andrew's input on this would help! I'm categorizing this as 'Migrations' for that reason.

A simpler solution might be something like:

  • remove all the implementation of IPAddressField that is not needed for migrations
  • make it inherit from RemovedField, which has some magic behaviour: when migrations are running, it works fine, but otherwise it throws an exception to stop it being instantiated.

comment:4 by Luke Plant, 9 years ago

If we do go down the route of requiring squashing to deal with this, perhaps we can have a modified deprecation policy for fields, so that we keep stubs for a bit longer and then remove the fields all at the same time. This should reduce the number of squashes that are necessary. We should also document this as best practice for the sake of 3rd party libraries.

comment:5 by Carl Meyer, 9 years ago

@spookylukey I'm aware that squashing alone doesn't fix it - that's why I said "and later remove" in my comment. The intent with migration squashing is that you can delete the squashed migrations once you are confident that everyone who has started into the squashed set has migrated past it. For a project's migrations this is pretty easy: once you deploy and migrate once, you can remove the replaced migrations. For a third party project, you'd do one release with a squashed migration, and then in the following release you could remove the replaced migrations, and everything will work fine as long as your users upgrade one release at a time and migrate at each upgrade. (For future users who start in with the newest release, there's no problem, it's as if the replaced migrations never existed at all.)

The one potential problem with squashing is that a RunPython or RunSQL operation in the squashed set can prevent the optimizer from optimizing through -- for instance if you had an IPAddressField, then had a RunPython or RunSQL operation, then later removed that field, the squashed migration will still keep the field around in the early end of its operation list because it has no idea what the RunPython/RunSQL might be doing, or whether it might break without that field. So this case can require manual modification of the squashed migration; in most cases that shouldn't be too difficult, you just remove the field addition operation, modify RunPython/RunSQL so it works without that field, and then remove the field-removal operation afterwards. This can be documented.

There are certainly fiddly bits here, and I'm not necessarily opposed to Django keeping field stubs around for longer than usual in some form to reduce the impact of our own field removals. I just want it to be clear that there is already a working solution to this problem that we can document for third-party field removals, at least.

comment:6 by Luke Plant, 9 years ago

@carljm The thing is that isn't "later" when you remove the migration - you have to remove it *now* to make any progress. At the point where people realise the breakage (in this case, when they upgrade to 1.9 and the migrations stop working), it requires two releases to fix the problem.

This means that I can't do a new release of my package featuring "Django 1.9" support. I have to first tell people "Upgrade to version X-1 of the library, run migrate, then upgrade to version X". i.e. I have to do a release just for the migration squashing, and people have to upgrade to that release as well as the next one.

It is far too optimistic to expect people to migrate away from a field and to squash their migrations in an earlier release, anticipating the problem of the field removal, especially when they think they have already done that by migrating away from the field. It's also really optimistic to expect people to read release notes. Only the largest Django libraries I've used have release notes at all. For most libraries, people expect to be able to upgrade to the latest version and things will basically work, after testing and fixing things.

Requiring upgrades to intermediary versions is often problematic, especially as in this case you actually have to deploy the intermediary version (in order for the manage.py migrate to get run). Very often you are going to run into bugs with old versions, which you have to worry about because you actually have to deploy the code. Requiring upgrades to intermediary versions multiplies the amount of work you have to do to get your dependencies up to date. It's only acceptable for a really large dependency such as Django itself, where:

  • you have an extremely well documented set of release notes,
  • that people know about and are expected to read,
  • and you also have point releases for the intermediary versions so that are known 'good' versions you will feel safe deploying.

In short, requiring extra releases just for migrations, and requiring people to upgrade to all intermediary versions of libraries would be a really horrible thing for people working in the Django ecosystem, and IMO the kind of process we should be too embarrassed to document. Certainly it would introduce a lot of pain into my life, as someone who maintains several and uses many 3rd party Django libraries.

comment:7 by Carl Meyer, 9 years ago

Hi Luke,

I agree that having to remove the squashed migrations before you can migrate (assuming the field is gone) isn't optimal. I also agree that it's common to skip versions when updating third-party apps, and so it's best if they don't have to rush the removal of outdated migrations.

I still think you're overstating the seriousness of this issue, for several reasons:

  1. It's a very rare occurrence. I looked back through the deprecation timeline, and I can find exactly two model fields that Django has ever deprecated (since 1.0, anyway), IPAddressField and XMLField. That's two field removals over the course of 9 releases and 7-8 years (by the time Django 1.9 is released). And the population of people affected by any particular field removal will always be much smaller than the overall user population, since unpopular fields are much more likely candidates for removal. The acceptability of an inconvenient process has to be taken in context of the frequency with which it is needed (and weighed against the costs of implementing a smoother process). Regarding third-party apps, I also use many and maintain several, and in the seven years I've used Django I can't remember a single model field ever removed from a third-party app that I use or maintain. I'm sure that it's happened in some apps somewhere, but I don't think it happens all that often. Maybe you happen to use/maintain apps that remove fields more frequently?
  1. The third-party app in question of course has another option available to them, the same one we have in Django and are discussing here: keep a deprecated field stub around a while longer. Even if they don't want to keep it around forever, that at least can give them all the time they need to take the squashed-migrations route at a very leisurely pace, which mitigates the "requires upgrade to intermediate versions" problem.
  1. Migrations run during testing, and for some time we've encouraged running tests with -W to notice deprecation warnings. So it doesn't seem all that optimistic to think that _some_ user or developer of most widely-used third-party apps might notice the deprecation warnings in their migrations sometime before the release that actually removes the field. (In other words, noticing a deprecation that's still hanging around in an old migration is no different from noticing any other kind of deprecation -- it's not any more likely to go unnoticed until the actual removal.)

So all in all, while it's far from ideal, I don't think documenting the current state of things is nearly as horrible for third-party apps as you make it out to be. And I'm not sure we have any other feasible option: fundamentally, keeping a living historical record of model changes that can be rolled forward and back (that is, migrations) is only possible if you can still access the components involved in those changes. So you have to keep those components around as long as you keep around the migrations that depend on them.

I gave some thought to how we might improve the situation. Squash-replaced migrations are never used (unless you're still in the midst of the block of replaced migrations), just imported long enough to figure out their dependencies and determine that they've been replaced. It might be possible to create some kind of lazy field wrapper that could be used in migration files to delay the actual import and instantiation of field classes until the moment when something actually needs to access their operations. But this would be complex to implement correctly, and I don't think it really helps: when you're confident your users no longer need the squashed migrations to be functional, you can just remove them. If you still have users in the midst of the squashed migrations, then it isn't safe to remove the referenced field, even if it's lazy-loaded.

comment:8 by Luke Plant, 9 years ago

I agree that removal of fields is not very common. But that is kind of irrelevant - because we are removing a field now, and there are likely thousands of projects that use IPAddressField - https://github.com/search?l=&q=IPAddressField+extension%3A.py&ref=advsearch&type=Code&utf8=%E2%9C%93

I don't think this is 'no different' to other deprecations, because you can't immediately go and fix the code that is causing the warning (which is itself an issue - warnings that you cannot do anything about are worse than nothing, because they drown out other warnings that are important. This is currently an issue for me as I try to get a test suite to run without warnings - removing the offending migrations is probably not going to be possible for a long time). You can't just remove the code - you have to plan the removal of the old migrations into a release process. And you have to do so while thinking about all the Django versions that you are still supporting.

Regarding your solution, I had the same idea above, but I think it does improve the situation drastically - it means that you now have an upgrade procedure for libraries that doesn't involve multiple releases:

If I want version 3.0 of my foobar library to support Django 1.9, I can just add the following to upgrade notes:

  • You will need to upgrade to foobar 3.0 and run migrations before you upgrade to Django 1.9, due to a removed field in Django 1.9

You simply have to ship 3.0 with squashed migrations that don't reference the removed field. If people fail to follow that instruction, they'll get a clear error which will stop the migration, and which can be fixed by downgrading to pre-1.9. The old migration files can be kept around until version 4.0 if you have to, as they don't bother anyone, which means you can keep them in the 'master' branch on GitHub or wherever. So people will still be able to upgrade to your latest version, provided they do so before upgrading Django.

This upgrade process fits into a normal upgrade/deploy procedure, which typically runs something like this when automated:

  • update sources
  • pip install -r requirements.txt
  • ./manage.py migrate

So there is no need to do any manual deploy etc.

The only extra burden is that you are forcing users to stagger the upgrades of dependencies, but it is quite likely that you would do that anyway (in order to check if some upgrade has broken something).

There is no installing of intermediate versions, and no testing of them, only to find bugs that don't exist in the latest version etc. It is this problem that is the real killer for me - if I have to install older versions, test and deploy etc., that is going to add hours of work. Multiply that across thousands of projects, and I think there is case for making this significantly less painful than it is at the moment. I appreciate that there is potentially some significant complexity involved of course.

comment:9 by Carl Meyer, 9 years ago

Regarding IPAddressField, I've already said I don't have a problem with keeping IPAddressField around (in some form) for longer than usual; I was moving on to the question of whether we need to do something for the general situation for third-party projects; frequency seems like a relevant factor there.

You're right that deprecations raised by migrations are different in that they are harder to get rid of - I was just pointing out that they aren't any less likely to be noticed in the first place.

I looked back through the thread, and your proposal (at least the one I found) is different from what I'd suggested. I don't think it's a good idea to try to detect squash migrations by filename and load those first, because that conflates the definition of a squash migration (one which contains the 'replaces' key) with an incidental property (the default file naming scheme used by the squashmigrations command). It's perfectly legitimate to rename a squash migration file however you like, or even write it yourself without using the squashmigrations command, so we shouldn't push a solution that would break in those cases. That's why I think we'd need to instead find a way to be able to import any migration file without having to immediately import and instantiate all its operations and their constituent fields.

I had suggested lazy wrappers for individual fields, but there may be an even simpler approach possible, where the entire operations list is wrapped in a function/method (and any imports of custom fields happen within that function/method) rather than being a class attribute. There may be problems I'm not thinking of, but I think that might not be very hard.

I think your analysis of how that would actually simplify the upgrade path is correct.

comment:10 by Markus Holtermann, 9 years ago

Cc: info+coding@… added

A few thoughts I collected over the last days while silently following the discussion here:

  • We should extend the lifetime of removed fields by a single version as stub fields. Shouldn't be too hard to get something "working" in migrations (aka not failing) but throwing warnings or blowing up during normal use.
  • Magic that happens during migrations and adds some legacy fields back, but not during "normal" runtime, makes migrations even worse to debug. It's hard enough to figure out where and when (in which migration) a model was defined.
  • The problem Django has with the IPAddressField is kind of a luxury. We could add some magic that will make migrations work even after the field was removed. This won't work for 3rd party apps though (as already said). Thus magic import features, as stated above, shouldn't be a solution.
  • Carl's idea to make operations a property that internally imports all used code, could work. I'm not sure what we are missing though.
  • Going with some @lazy decorator or implementation won't help in any way. The classes still need to be imported.
  • Referencing imports by strings would be possible, but it's quite invasive:
from django.db import migrations, get_migration_operation, get_migration_thing


def get_migration_operation(to_import, *args, **kwargs):
    try:
        klass = import_string(to_import)
        return klass(*args, **kwargs)
    except ImportError:
        return NoOp()


def get_migration_thing(to_import, *args, **kwargs):
    try:
        klass = import_string(to_import)
        return klass(*args, **kwargs)
    except ImportError:
        # raise warning / error


class Migration(migrations.Migration):

    dependencies = []

    operations = [
        migrations.get_migration_operation('django.db.migrations.operations.CreateModel', fields=[
            migrations.get_migration_thing('myapp.fields.MyField', 'x', 'y'),
        ]),
        migrations.get_migration_operation('django.db.migrations.operations.ANewOperationIn18', 'arg1', kwarg1='kwarg1'),
    ]

comment:11 by Markus Holtermann, 9 years ago

Severity: NormalRelease blocker
Summary: IPAddressField removal will break migrations and test suitesFields referenced in migrations cannot be (re)moved, even if migrations have been squashed
Triage Stage: UnreviewedAccepted

I mark this issue as accepted and release blocker, since we have to find a solution for 1.8

comment:12 by Markus Holtermann, 9 years ago

I moved the issue about extending the lifetime for the IPAddressField to #23891 as the discussion here moved to a more general way of handling imports in migration files.

comment:13 by Tim Graham, 9 years ago

In #23891, I've proposed handling the deprecation of IPAddressField using the system check framework. Perhaps we can recommend this solution for third-party fields as well.

comment:14 by Markus Holtermann, 9 years ago

While looking at Tim's patch for #23891 I came up with the following:

class Field(...):
    deprecated_for_migrations = False
    removed_for_migrations = False

    def check(...):
        # ...
        errors.extend(self._check_migration_availability(...))
        # ...

    def _check_migration_availability(self):
        if self.__class__.removed_for_migrations:
            return [
                checks.Error(
                    'The field %s has been removed except for support in historical migrations.' % self.__class__.__name__,
                    hint=None,
                    obj=self,
                    id='fields.EXXX',
                )
            ]
        elif self.__class__.deprecated_for_migrations:
            return [
                checks.Warning(
                    'The field %s has been deprecated. Support for it (except in historical migrations) will be removed according to the deprecation timeline' % self.__class__.__name__,
                    hint=None,
                    obj=self,
                    id='fields.WXXX',
                )
            ]

class IPAddressField(...):
    deprecated_for_migrations = True

This comes with one drawback that Tim correctly noted:

It seems nice to be able to have a unique code for each field so they can be silenced if desired, rather than one code for all deprecations, but this seems possible only if we use the model field class name to generate the code, in which case we'll have to depart from the usual numbering scheme.

comment:15 by Tim Graham, 9 years ago

Has patch: set
Needs tests: set
Owner: changed from nobody to Tim Graham
Status: newassigned

PR with work in progress. Tests to come.

comment:16 by Tim Graham, 9 years ago

Needs tests: unset

comment:17 by Markus Holtermann, 9 years ago

Triage Stage: AcceptedReady for checkin

LGTM

comment:18 by Tim Graham <timograham@…>, 9 years ago

Resolution: fixed
Status: assignedclosed

In c87ee41954737869e6026d6fb12394ec4f79d50a:

Fixed #23861 -- Added an API to deprecate model fields.

Thanks Markus Holterman and Berker Peksag for review.

comment:19 by Tim Graham <timograham@…>, 9 years ago

In 789baf9c3a432297ff775aee7f174494e4fd9962:

Fixed test failures introduced in refs #23861.

Note: See TracTickets for help on using tickets.
Back to Top