Code

Opened 7 months ago

Last modified 2 months ago

#21127 new New feature

on_delete=models.SET_NULL should be the default for nullable FKs

Reported by: carljm Owned by: nobody
Component: Database layer (models, ORM) Version: 1.5
Severity: Normal Keywords:
Cc: hv@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This wasn't done when we added the on_delete feature due to concerns about breaking backwards-compatibility with the previous always-cascade-deletes behavior. But if you set aside legacy considerations, it seems intuitively obvious (to me anyway) that SET_NULL is a superior default for a nullable FK than CASCADE. And even if others might have a different intuition, the consequences of your intuition being wrong are far from symmetrical: one way you get a model instance with a null FK hanging around that you didn't expect to still have, the other way you get data loss.

The trick, of course, is that changing default values is hard to do with a deprecation process. For a deprecation warning to catch the people it needs to catch, you have to warn anytime an explicit value is not specified. So you have to force everybody to stop relying on the default in order to silence the deprecation warnings, which reduces the value of having a new default, in the short run anyway.

I think it's still worth doing for the long run, though.

We could also consider doing it without a deprecation process, just a backwards-incompatible note in the release notes. Since the consequences are the opposite of data loss, that might be ok. Thoughts?

Attachments (0)

Change History (15)

comment:1 Changed 7 months ago by russellm

  • Triage Stage changed from Unreviewed to Accepted

I completely agree that SET_NULL is a more intuitive on_delete behavior for nullable fields.

Also agreed that the deprecation process is the biggest impediment.

The recently completed GSoC project to refactor validation.py may give us a way forward. 1.6 introduced a check command that provides a way to check your project for known compatibility problems. When the GSoC validation project lands, this check command will become the replacement for manage.py validate, so on the first syncdb/runserver after an update, you'll get a bunch of warnings for things like unittest discovery changes and the changed default on BooleanField.

Using this framework, we could add a validation check for foreign key on_delete values, we can flag to the user all the possible models that may need to be updated as a result of the change. And these messages can also be silenced, so if they are audited and turn out to not be a problem, you can silence them so they aren't a persistent blight on your console.

So - agreed that this is something we should do. The exact plan needs discussion.

comment:2 Changed 7 months ago by aaugustin

+1

comment:3 Changed 7 months ago by lukeplant

I'm not sure about this change.

I've had to deal with instances with a FK was only nullable to cope with old (bad) data. In that scenario, having completely different default behaviour is far from ideal.

Also, I'm not entirely convinced by the argument about the difference in consequences between data loss and null FK. A null FK is also data loss/damage. To contrive an example:

Suppose you have an Employee table, with a "manager" field that is an FK to self. Partners in the company, having no boss/manager, have a NULL in this field, and that is used, say, for certain security uses. Using ON DELETE SET NULL, deleting any manager will promote every employee under that manager to being a partner, which would be bad.

Also, I think there might be an argument from explictness:

With the current default, if you have a nullable FK, which has ON DELETE SET NULL, and you are thinking about making it non-nullable, you can see straight away that you will have a change in behaviour regarding deletion, because the on_delete value has to be present and you'll have to change it. With the proposed default, it's easier to miss the fact that you've introduced a big change in behaviour by changing the nullability of a field.

Overall, I agree that if we were starting from scratch, the current default isn't ideal, and I would go for "SET NULL". But what I'm saying is that the alternative isn't absolutely win-win, and given the current behaviour is known, we also have to take into account the cost of changing this default behaviour - the change and confusion could cause loss/damage in itself.

comment:4 Changed 7 months ago by mjtamlyn

Taking your example, the current behaviour means deleting a manager (silently) deletes all the people who worked for them. This is much worse than promoting them IMO as you get data loss.

In either case you need to be careful about the child objects.
The old behaviour is easily reenabled as well.

comment:5 Changed 7 months ago by carljm

Certainly there is some cost to the change itself; that's what we have to weigh. I expect Django to be around for a long time - I think the short-term cost in this case is manageable enough that it's worth it to arrive at the behavior it seems we all agree we'd choose if starting from scratch.

Like mjtamlyn, I'm unimpressed by your specific example. Of course some data is going to be "lost" regardless when you delete things - that's the meaning of deletion. But when an FK can be set null, setting it null rather than deleting its entire instance is unequivocally the choice that minimizes the potential for additional unexpected data loss. Contrived examples where the null FK is also "bad" in some way don't make that any less true. At least you can easily fix the wrongly promoted managers (though it's arguable that they were even wrongly promoted: you deleted their manager, therefore they have no manager, therefore by the logic of the schema you chose to set up they are now managers); you can't recover (short of backups external to Django) the entire tree of deleted employees you'd get with the current default.

Your point about explicitness is interesting, but I can hardly see it as an argument for keeping the current default. If any of the options ought to require explicit opt-in, it's the one with the most silent data-loss potential: that is, CASCADE. If we took the explicitness argument seriously, it would lead us to make PROTECT the default in all cases and require any cascading behavior at all to be explicitly opt-in (which is what SQL does).

comment:6 Changed 7 months ago by loic84

@russellm, I don't recommend a static check for this, but rather a dynamic one which requires an explicit on_delete to silence (when null=True).

That way, if you intend to keep the old behavior, silencing the warning actually solves the long-term issue. If you intend to switch to the new behavior, you get to use it right away and save another auditing when the switch of default behavior is about to happen.

Also, by silencing the "check" by code like the GSoC project offers, you put yourself at risk if new FKs are introduced in a project after the initial auditing + silencing took place.

comment:7 Changed 7 months ago by manfre

on_delete is one of those settings that is important enough that there shouldn't be a default behavior. I rank it up there the same as max_length on CharField. I'm pretty sure that there are many projects out there where the developers didn't put much thought in to the on_delete behavior and they really should, otherwise they might find themselves trying to CASCADE delete most of the database when they delete a single record. For a large project with many referenced models, removing the default and requiring on_delete means a small time sink when they read the release notes or immediately after they update Django to the next version.

The argument that setting the value to NULL is any less of a data loss event is flawed. Data was unexpectedly changed, which is never good. When data is missing (CASCADE), or corrupted (SET NULL), you'll need to dig through logs and compare to backups to figure out the extent of the damage.

For some projects, sure, you can go in to the admin and see that a FK is null and just give it a new value, but at best that just sweeps the problem under the rug until it happens again. For other projects, setting a FK to null could result in secured content being exposed to the public, which is worse than needing to restore deleted records.

comment:8 Changed 7 months ago by mjtamlyn

Making it a required option is a good argument, but unfortunately it's more of an overhead that just updating your own project. It's also updating *every* foreign key in *every* external package you're using. A change like this would be an enormous hurdle to adoption of a new Django version.

comment:9 Changed 7 months ago by russellm

@mjtamlyn Yes, it would be an overhead, but in this case, I think it's an overhead worth exploring. I'd argue that data deletion is an area where explicit is clearly better than implicit, and moving towards greater explicitness is something that warrants a little pain -- especially if we can work out a way to introduce the new behaviour gradually.

comment:10 Changed 7 months ago by mjtamlyn

Yes, I think a gradual update to no default on_delete is likely a good option. Possibly even a better option that the NULL default. I just wanted to point out it's a bit more than a "small time sink" for projects with many dependencies.

comment:11 Changed 7 months ago by carljm

I don't have any objection to migrating to 'no default' to avoid any possibility for implicit data loss. The downside is that it's yet another thing newbies have to comprehend and make a decision about in order to write their very first models, but I can certainly see the argument that it's worth making them learn one more thing right away, rather than surprising them with data loss later.

I'd earlier suggested a default of 'PROTECT' as an option for "no implicit data loss", but PROTECT means that the same code can run fine or raise an exception depending on the state of the database. That can be okay as an opt-in, but it's not a good default either.

Fortunately the migration path to "no default" is quite a bit simpler than the one to change a default. All we'd have to do is deprecate instantiation of ForeignKey without an on_delete argument, and then make it an actually-required argument once the deprecation period ends.

comment:12 Changed 7 months ago by manfre

@mjtamlyn, If any apps/projects require a lot of thought before setting the behavior, that means we've done a good thing because they've never thought about it before and were blindly using CASCADE.

I don't think there are any arguments that could make me think that it would be bad to force people, even newbies, to think about and understand what will happen to their data when they start deleting records.

comment:13 Changed 7 months ago by manfre

If on_delete becomes required, I'm willing to take on the task of updating all of the examples in the docs to include on_delete=models.CASCADE or a potentially more likely real world behavior.

comment:14 Changed 7 months ago by EvilDMP

Also being discussed on the Django developers email list, https://groups.google.com/forum/#!topic/django-developers/FJMoGgtYIX4

comment:15 Changed 2 months ago by guettli

  • Cc hv@… added

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as new
The owner will be changed from nobody to anonymous. Next status will be 'assigned'
as The resolution will be set. Next status will be 'closed'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.