Opened 11 years ago

Last modified 8 years ago

#21127 closed New feature

on_delete should be a required parameter for ForeignKey — at Version 19

Reported by: Carl Meyer Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: hv@…, flavio.curella@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Carl Meyer)

(Update: consensus below was to make on_delete a required argument, rather than twiddling with its defaults.)

This wasn't done when we added the on_delete feature due to concerns about breaking backwards-compatibility with the previous always-cascade-deletes behavior. But if you set aside legacy considerations, it seems intuitively obvious (to me anyway) that SET_NULL is a superior default for a nullable FK than CASCADE. And even if others might have a different intuition, the consequences of your intuition being wrong are far from symmetrical: one way you get a model instance with a null FK hanging around that you didn't expect to still have, the other way you get data loss.

The trick, of course, is that changing default values is hard to do with a deprecation process. For a deprecation warning to catch the people it needs to catch, you have to warn anytime an explicit value is not specified. So you have to force everybody to stop relying on the default in order to silence the deprecation warnings, which reduces the value of having a new default, in the short run anyway.

I think it's still worth doing for the long run, though.

We could also consider doing it without a deprecation process, just a backwards-incompatible note in the release notes. Since the consequences are the opposite of data loss, that might be ok. Thoughts?

Change History (19)

comment:1 by Russell Keith-Magee, 11 years ago

Triage Stage: UnreviewedAccepted

I completely agree that SET_NULL is a more intuitive on_delete behavior for nullable fields.

Also agreed that the deprecation process is the biggest impediment.

The recently completed GSoC project to refactor validation.py may give us a way forward. 1.6 introduced a check command that provides a way to check your project for known compatibility problems. When the GSoC validation project lands, this check command will become the replacement for manage.py validate, so on the first syncdb/runserver after an update, you'll get a bunch of warnings for things like unittest discovery changes and the changed default on BooleanField.

Using this framework, we could add a validation check for foreign key on_delete values, we can flag to the user all the possible models that may need to be updated as a result of the change. And these messages can also be silenced, so if they are audited and turn out to not be a problem, you can silence them so they aren't a persistent blight on your console.

So - agreed that this is something we should do. The exact plan needs discussion.

comment:2 by Aymeric Augustin, 11 years ago

+1

comment:3 by Luke Plant, 11 years ago

I'm not sure about this change.

I've had to deal with instances with a FK was only nullable to cope with old (bad) data. In that scenario, having completely different default behaviour is far from ideal.

Also, I'm not entirely convinced by the argument about the difference in consequences between data loss and null FK. A null FK is also data loss/damage. To contrive an example:

Suppose you have an Employee table, with a "manager" field that is an FK to self. Partners in the company, having no boss/manager, have a NULL in this field, and that is used, say, for certain security uses. Using ON DELETE SET NULL, deleting any manager will promote every employee under that manager to being a partner, which would be bad.

Also, I think there might be an argument from explictness:

With the current default, if you have a nullable FK, which has ON DELETE SET NULL, and you are thinking about making it non-nullable, you can see straight away that you will have a change in behaviour regarding deletion, because the on_delete value has to be present and you'll have to change it. With the proposed default, it's easier to miss the fact that you've introduced a big change in behaviour by changing the nullability of a field.

Overall, I agree that if we were starting from scratch, the current default isn't ideal, and I would go for "SET NULL". But what I'm saying is that the alternative isn't absolutely win-win, and given the current behaviour is known, we also have to take into account the cost of changing this default behaviour - the change and confusion could cause loss/damage in itself.

comment:4 by Marc Tamlyn, 11 years ago

Taking your example, the current behaviour means deleting a manager (silently) deletes all the people who worked for them. This is much worse than promoting them IMO as you get data loss.

In either case you need to be careful about the child objects.
The old behaviour is easily reenabled as well.

comment:5 by Carl Meyer, 11 years ago

Certainly there is some cost to the change itself; that's what we have to weigh. I expect Django to be around for a long time - I think the short-term cost in this case is manageable enough that it's worth it to arrive at the behavior it seems we all agree we'd choose if starting from scratch.

Like mjtamlyn, I'm unimpressed by your specific example. Of course some data is going to be "lost" regardless when you delete things - that's the meaning of deletion. But when an FK can be set null, setting it null rather than deleting its entire instance is unequivocally the choice that minimizes the potential for additional unexpected data loss. Contrived examples where the null FK is also "bad" in some way don't make that any less true. At least you can easily fix the wrongly promoted managers (though it's arguable that they were even wrongly promoted: you deleted their manager, therefore they have no manager, therefore by the logic of the schema you chose to set up they are now managers); you can't recover (short of backups external to Django) the entire tree of deleted employees you'd get with the current default.

Your point about explicitness is interesting, but I can hardly see it as an argument for keeping the current default. If any of the options ought to require explicit opt-in, it's the one with the most silent data-loss potential: that is, CASCADE. If we took the explicitness argument seriously, it would lead us to make PROTECT the default in all cases and require any cascading behavior at all to be explicitly opt-in (which is what SQL does).

comment:6 by loic84, 11 years ago

@russellm, I don't recommend a static check for this, but rather a dynamic one which requires an explicit on_delete to silence (when null=True).

That way, if you intend to keep the old behavior, silencing the warning actually solves the long-term issue. If you intend to switch to the new behavior, you get to use it right away and save another auditing when the switch of default behavior is about to happen.

Also, by silencing the "check" by code like the GSoC project offers, you put yourself at risk if new FKs are introduced in a project after the initial auditing + silencing took place.

comment:7 by Michael Manfre, 11 years ago

on_delete is one of those settings that is important enough that there shouldn't be a default behavior. I rank it up there the same as max_length on CharField. I'm pretty sure that there are many projects out there where the developers didn't put much thought in to the on_delete behavior and they really should, otherwise they might find themselves trying to CASCADE delete most of the database when they delete a single record. For a large project with many referenced models, removing the default and requiring on_delete means a small time sink when they read the release notes or immediately after they update Django to the next version.

The argument that setting the value to NULL is any less of a data loss event is flawed. Data was unexpectedly changed, which is never good. When data is missing (CASCADE), or corrupted (SET NULL), you'll need to dig through logs and compare to backups to figure out the extent of the damage.

For some projects, sure, you can go in to the admin and see that a FK is null and just give it a new value, but at best that just sweeps the problem under the rug until it happens again. For other projects, setting a FK to null could result in secured content being exposed to the public, which is worse than needing to restore deleted records.

comment:8 by Marc Tamlyn, 11 years ago

Making it a required option is a good argument, but unfortunately it's more of an overhead that just updating your own project. It's also updating *every* foreign key in *every* external package you're using. A change like this would be an enormous hurdle to adoption of a new Django version.

comment:9 by Russell Keith-Magee, 11 years ago

@mjtamlyn Yes, it would be an overhead, but in this case, I think it's an overhead worth exploring. I'd argue that data deletion is an area where explicit is clearly better than implicit, and moving towards greater explicitness is something that warrants a little pain -- especially if we can work out a way to introduce the new behaviour gradually.

comment:10 by Marc Tamlyn, 11 years ago

Yes, I think a gradual update to no default on_delete is likely a good option. Possibly even a better option that the NULL default. I just wanted to point out it's a bit more than a "small time sink" for projects with many dependencies.

comment:11 by Carl Meyer, 11 years ago

I don't have any objection to migrating to 'no default' to avoid any possibility for implicit data loss. The downside is that it's yet another thing newbies have to comprehend and make a decision about in order to write their very first models, but I can certainly see the argument that it's worth making them learn one more thing right away, rather than surprising them with data loss later.

I'd earlier suggested a default of 'PROTECT' as an option for "no implicit data loss", but PROTECT means that the same code can run fine or raise an exception depending on the state of the database. That can be okay as an opt-in, but it's not a good default either.

Fortunately the migration path to "no default" is quite a bit simpler than the one to change a default. All we'd have to do is deprecate instantiation of ForeignKey without an on_delete argument, and then make it an actually-required argument once the deprecation period ends.

comment:12 by Michael Manfre, 11 years ago

@mjtamlyn, If any apps/projects require a lot of thought before setting the behavior, that means we've done a good thing because they've never thought about it before and were blindly using CASCADE.

I don't think there are any arguments that could make me think that it would be bad to force people, even newbies, to think about and understand what will happen to their data when they start deleting records.

comment:13 by Michael Manfre, 11 years ago

If on_delete becomes required, I'm willing to take on the task of updating all of the examples in the docs to include on_delete=models.CASCADE or a potentially more likely real world behavior.

comment:14 by Daniele Procida, 11 years ago

Also being discussed on the Django developers email list, https://groups.google.com/forum/#!topic/django-developers/FJMoGgtYIX4

comment:15 by Thomas Güttler, 11 years ago

Cc: hv@… added

comment:16 by Klaas van Schelven, 10 years ago

+1 on this issue (though it has been apparently asleep for 8 months now).

We just came very close to being badly burnt by the current on_delete=CASCADE behavior. I.e. caught a bug with a large potential for data loss a day before moving it into production. Specifically surprising was the fact that this is the default even for null=True fields.

My order of preference for the new implementation:

  • "always explicit" (both for null=True and null=False).
  • global default of on_delete=PROTECT.
  • on_delete=SET_NULL on nullable fields

IMHO the cost of migration is real, but greatly outweighed by the potential cost of data loss.

One more remark that may be helpful when contriving examples:

The amount of surprise about cascading deletions is directly related relative importance of the fk field ranked among the other fields of that model.

A good example of "very important fields" of models is the "parent" (or similar) field in a hierarchical structure. Example: what happens if I delete this folder: will the contents be orphaned or will they be deleted? The very fact that a "folder" is defined as "something that contains stuff" forces us to think about the consequences of deletion.

In the case of hierarchical structures, the programmer is more likely to think about the problem of "what happens on delete" herself already, and the likelihood of unexpected data loss (both of the kind reported by carljm and of the kind described by lukeplant) is in fact the smallest.

The most dangerous examples are the ones were the FK describes "just another optional flag of this object". E.g. after running my blog for 5 years I decide to experiment with an optional "color" attribute on my pages. In the status quo, not providing on_delete (or dealing with it manually) means that this little experiment opens me up to a huge potential for data loss.

Unfortunately, the mapping between null=True/False and "semantics of an FK are hierarchical or not" is not 1-to-1, so this observation does not provide us the "one true answer" on what would be the best default (if any). It's more of a tool that may be helpful in finding the correct examples to reason about.

However, I do think it's worthwhile to point out that in the case of a trivial, optional attribute (the case in which we most certainly do not expect a deletion, and may very likely forget to reason about it), the attribute will almost always be nullable.

In the light of the above scale it's interesting to note that lukeplant's example is somewhere in between "files & folders" and "optional color attribute". In fact, depending on the organization he describes, the relative importance of a manager may very well differ. (and hence the chance that either programmers or users remember that removing a partner from the organization means their subordinates must be reassigned to a new manager).

comment:17 by Flavio Curella, 9 years ago

I'm willing to work on this, if there's consensus.

My understanding of the thread is that the most favorable option is to change on_delete to be required, according to our deprecation policy. Should I go that way?

Last edited 9 years ago by Flavio Curella (previous) (diff)

comment:18 by Flavio Curella, 9 years ago

Cc: flavio.curella@… added

comment:19 by Carl Meyer, 9 years ago

Description: modified (diff)
Summary: on_delete=models.SET_NULL should be the default for nullable FKson_delete should be a required parameter for ForeignKey

That looks to me like the thread consensus (and I agree with it).

Note: See TracTickets for help on using tickets.
Back to Top