Opened 10 years ago

Last modified 12 days ago

#21961 assigned New feature

Add support for database-level cascading options

Reported by: Christophe Pettus Owned by: Akash Kumar Sen
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: emorley@…, andre@…, Adam Wróbel, Nick Stefan, Hannes Ljungberg, Carlton Gibson, Florian Demmer, Peter Thomassen, Adrien Delessert, Adam Johnson, raydeal, John Speno, eduards-amped, David Wobrock, bcail, lo0, Roman Donchenko Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Per a discussion on -developers, this ticket is to document this proposed new feature.

The general idea is to allow developers to specify that the database itself, rather than Django, should implement foreign key constraints where possible. The database can be considerably more efficient, and often can avoid locking situations, in a way that the Django backend can't.

The proposal is to add a models.CASCADE_DB, models.SET_NULL_DB, models.PROTECT_DB constants. These specify that the appropriate functionality is to be implemented in the database if the database supports it; otherwise, they are implemented in the Django core as now.

Some potential design issues with proposed solutions:

  1. It is not an error to specify the _DB version with a database that doesn't support it. Instead, you get the non-_DB version.
  2. The _DB version does not fire signals; this will need to be documented.
  3. If a model A references a model B using CASCADE_DB, but model B references model C using regular CASCADE, a deletion of A won't cascade all the way to C. This needs to be documented. Another possibility is to make this an error condition and have model validation fail, but that seems excessive.
  4. The _DB version is disallowed on generic foreign keys, because that way madness lies.
  5. The implicit foreign key created from child to parent in model inheritance is never the _DB version. This is a shame, but there are two issues:

a) Since it's created implicitly, there's no place to put it.
b) Even if we extended the mechanism to allow manual specification of the key, deleting the parent wouldn't automatically delete the child.

Change History (61)

comment:1 by Marc Tamlyn, 10 years ago

Triage Stage: UnreviewedAccepted

If 3 can be introspected for (which should be possible) we can at least implement a check for it in the new checks framework.

comment:2 by Aymeric Augustin, 10 years ago

  1. is tricky. My instinct would be to fail with an exception when the code requires something that cannot be achieved. However, I understand that pluggable apps may want to take advantage of database-level cascades, while still supporting less capable databases. Short of introducing a third value (CASCADE_DB_PROVIDED_YOU_ARE_USING_A_REAL_DATABASE), your proposal is probably the best solution.

A similar argument can be made for 3.

5 might be just another argument against MTI...

comment:3 by Simon Charette, 10 years ago

  1. It is possible to specify the key:
class Parent(models.Model):
    pass

class Child(Parent):
    parent = models.OneToOneField(Parent, parent_link=True)

comment:4 by Anssi Kääriäinen, 10 years ago

I think the description has parent and child deletion mixed. Deleting the parent will delete the child (there is a key from child to parent). The problem is child deletion. In Django deleting the child cascades to the parent row, too. But there is no column from parent to child you can cascade along if you do cascades in the DB. The real problematic case:

class Related(models.Model):
    pass

class Parent(models.Model):
    pass

class Child(Parent):
    related = models.ForeignKey(Related, on_delete=DB_CASCADE)

When you delete Related instance, Child instances pointing to it will be deleted by db cascade, but the parent instances will be left alone. If you instead use standard CASCADE, then the parent instances will be deleted, too. This is surprising behaviour. It can be documented, but erroring out would be IMO better.

comment:5 by Shai Berger, 10 years ago

1+2 => When using CASCADE_DB, signals will fire only on backends which do not support the feature.

Similar issues, of course, for 1+3, 1+5.

I find this result quite disturbing.

Alternative: When using CASCADE_DB on a backend where the database cannot implement it, Django tries to emulate it; this is not equivalent to CASCADE.

comment:6 by Tim Graham, 9 years ago

Version: 1.7-alpha-1master

comment:7 by Alexander Schepanovski, 9 years ago

For me it actually ok that signals won't be called and some deletion propagation could be broken. It's kind of .raw() and .extra() if you mess with database you should handle all these things.

My use case is using a database with another non-django app. So I can't rely on Djangos cascading.

comment:8 by Anssi Kääriäinen, 9 years ago

I won't oppose an approach where checks framework will warn about dangerous behavior.

For issue 1+2 mentioned by Shai in comment:5: we should disable signals when cascading along CASCADE_DB converted to normal CASCADE due to not having support for CASCADE by the backend.

comment:9 by Carl Meyer, 9 years ago

I think we should consider introducing this feature as a totally separate kwarg (on_delete_db) rather than conflating it with the existing Python-level on_delete. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.

Right now the contract of on_delete is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have an on_delete kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).

So under my proposal, if you want database-level cascade, you would specify on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.

For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of on_delete_db on a backend that can't support it should be an error.

comment:10 by Tim Graham, 9 years ago

Summary: ForeignKey.on_delete supports database-level cascading optionsAdd support for database-level cascading options

comment:11 by Gustavo Narea, 9 years ago

In case anyone is interested in a workaround for PostgreSQL, I created this simple plugin:
https://pypi.python.org/pypi/django-postgres-delete-cascade

That's meant as a temporary solution, until Django supports this out of the box.

comment:12 by Ed Morley, 8 years ago

Cc: emorley@… added

comment:13 by André Cruz, 7 years ago

Cc: andre@… added

comment:14 by Nick Stefan, 7 years ago

Owner: changed from nobody to Nick Stefan
Status: newassigned

comment:15 by Nick Stefan, 7 years ago

postgresql and mysql are fine. However, for SQLite to use ON DELETE, we would need a new OPTIONS flag. Something like:

settings.DATABASES.["default"]["OPTIONS"]["PRAGMA_foreign_keys"] = True

(which would then lead to the database wrapper running the correct SQL: https://sqlite.org/foreignkeys.html#fk_enable)

If I add the pragma flag, the test suite fails quite a few tests (50+). Therefore, SQLite PRAGMA foreign keys, seems to be a bigger and separate scope. It does have its own ticket here https://code.djangoproject.com/ticket/14204. So postgresql and mysql are it, for now.

in reply to:  9 ; comment:16 by Dylan Young, 6 years ago

A separate kwarg (on_delete_db) is a much better proposal:

1) It (together with adequate documentation) eliminates *ALL* of the ambiguities present in the current proposal.
2) When this kwarg is unspecified, it should default to the value of on_delete, ensuring as closely as possible consistency between raw database operations and Django operations.
3) Otherwise it leaves granular control up to the user: Don't need delete signals (postgres should be able to support post_delete, no?)? Use the configuration outlined by Carl. Need delete signals in your Django application, but still want sane deletion behaviour in other applications using your DB? Simply set on_delete. Need your other applications to have different deletion behaviour? Just reverse the configuration outlined by Carl... Or any number of options?
4) You could add some special flags that isolate the ORM-level deletion to specific DB-problematic cases (Generic Foreign Keys, Inheritance...), thus eliminating the overhead of python-space constraints in the majority of cases.
5) Maybe it also makes testing Django's ORM on_delete behaviour easier? By adding the ability to easily cross-reference with the Database's internal implementation... could be wrong on that, but with on_delete set to cascade, you would expect the final deletion call of the original object (after all the ORM work to delete related objects) to only delete a single instance right? If it deletes more than expected, the error in the ORMs deletion emulation is exposed.

An extra thought on Postgres:

It seems like it should be possible to support signals with DB-level constraints. Actually any DB that supports returning + transactions should be capable, no?

Replying to Carl Meyer:

I think we should consider introducing this feature as a totally separate kwarg (on_delete_db) rather than conflating it with the existing Python-level on_delete. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.

Right now the contract of on_delete is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have an on_delete kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).

So under my proposal, if you want database-level cascade, you would specify on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.

For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of on_delete_db on a backend that can't support it should be an error.

in reply to:  16 comment:17 by Nick Stefan, 6 years ago

Hi Dylan.

I think your thoughts do make logical sense in response to Carl. However, that proposal was written a long time ago (it was circa 2014). I think most of that "ambiguity" discussed then comes from 2014 DBs not supporting ON DELETE CASCADE. That lack of support then required the need to have automatic fall backs.

In 2017, all of the supported databases should support ON DELETE CASCADE, so a single on_delete kwarg still makes sense to me.

I guess there is "ambiguity" regarding signals etc, but I don't think that should be the case. DO_NOTHING has been a value for a long time. DB_CASCADE is really just DO_NOTHING plus some more SQL, so it should be understood that there can't be signals.

Of course, that's just my opinion, and I admit that it might be unpopular based on how my PR has fared lol :).

I opened a PR back in the summer proving the concept, and added pre check tests to prevent hairy situations like generic foreign keys, concrete model inheritance, etc: https://github.com/django/django/pull/8661 .

--Nick

Replying to Dylan Young:

A separate kwarg (on_delete_db) is a much better proposal:

1) It (together with adequate documentation) eliminates *ALL* of the ambiguities present in the current proposal.
2) When this kwarg is unspecified, it should default to the value of on_delete, ensuring as closely as possible consistency between raw database operations and Django operations.
3) Otherwise it leaves granular control up to the user: Don't need delete signals (postgres should be able to support post_delete, no?)? Use the configuration outlined by Carl. Need delete signals in your Django application, but still want sane deletion behaviour in other applications using your DB? Simply set on_delete. Need your other applications to have different deletion behaviour? Just reverse the configuration outlined by Carl... Or any number of options?
4) You could add some special flags that isolate the ORM-level deletion to specific DB-problematic cases (Generic Foreign Keys, Inheritance...), thus eliminating the overhead of python-space constraints in the majority of cases.
5) Maybe it also makes testing Django's ORM on_delete behaviour easier? By adding the ability to easily cross-reference with the Database's internal implementation... could be wrong on that, but with on_delete set to cascade, you would expect the final deletion call of the original object (after all the ORM work to delete related objects) to only delete a single instance right? If it deletes more than expected, the error in the ORMs deletion emulation is exposed.

An extra thought on Postgres:

It seems like it should be possible to support signals with DB-level constraints. Actually any DB that supports returning + transactions should be capable, no?

Replying to Carl Meyer:

I think we should consider introducing this feature as a totally separate kwarg (on_delete_db) rather than conflating it with the existing Python-level on_delete. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.

Right now the contract of on_delete is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have an on_delete kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).

So under my proposal, if you want database-level cascade, you would specify on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.

For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of on_delete_db on a backend that can't support it should be an error.

comment:18 by Dylan Young, 6 years ago

I see what you're saying.

I still think there is value to the in-app deletion policy (without the need to "roll your own") alongside a DB-enforced deletion policy. For example, a very sensible scheme would be to have both models.PROTECT_DB and models.CASCADE enabled, whereas (correct me if I'm mistaken) your scheme *only* allows allow DO_NOTHING in combination with any _DB setting, which has the nasty side-effect of suppressing signals.

This could still be done in your scheme of course (and can be done now), but it involves either a custom migration... or a bit of hacking of the migration system, and this creates a disparity between model definition and model implementation in the DB, not to mention the extra coding effort.

A context manager for bypassing the ORM deletion policy would also be a nicety (this already exists I think, but I can't recall 100%).

Alternatively a syntax that allows ORing the *_DB flags with the standard ORM on_delete flags could be used, but I think a separate kwarg is cleaner.

I'm curious too if there's any significant performance gain to be had from simulating the delete (without actually deleting) in the ORM in order to correctly fire the signals, but letting the DB deal with the actual deletion. This would probably mean that the signal ordering would be slightly different (all pre-delete signals would fire, followed by all post-delete signals).

In any case, this is actually a huge problem for Django where a small bulk deletion (say 100 rows) can take hours, and I would be thankful for any resolution to the problem.

comment:19 by Nick Stefan, 6 years ago

@Dylan

or example, a very sensible scheme would be to have both models.PROTECT_DB and models.CASCADE enabled

I do understand this desire. The existing foreign key constraints that get added for models.CASCADE, only help you when you later go to edit a row with one of those not valid foreign keys. It wont enforce / protect you when models.CASCADE doesn't get all of the rows it should have. I hadn't considered that there could be a desire for DB_DELETE that is completely outside performance desires.

Is that a fair characterization?

comment:20 by Ivan Anishchuk, 5 years ago

Just an idea, having two kwargs (on_delete/on_delete_db) is not entirely different from having one kwarg and two sets of constants (SET_NULL/SET_NULL_DB) -- we can always combine those constants like this: on_delete=SET_NULL|SET_NULL_DB and have the described behavior without adding a separate kwarg. Assuming it's appropriate in this case (although it is the way Q objects are used, for example) and that having such combinations is desirable at all (I could think of some cases but I'm not sure I can remember any such in any of my recent projects).

comment:21 by Adam Wróbel, 5 years ago

Cc: Adam Wróbel added

comment:22 by Nick Stefan, 5 years ago

Cc: Nick Stefan added
Has patch: set

comment:23 by Mariusz Felisiak, 4 years ago

Patch needs improvement: set

in reply to:  23 comment:24 by Nick Stefan, 4 years ago

Replying to felixxm:

I noticed the patch is still set to "needs improvement". We've been going through code review process in github; do I leave that flag here as true after the code review follow ups? Or do I change that flag once code review fixes go through? Thanks!

comment:25 by Nick Stefan, 4 years ago

Patch needs improvement: unset

Code review seems to be complete, so unsetting "needs improvement".

comment:26 by Mariusz Felisiak, 4 years ago

Needs tests: set

comment:27 by Nick Stefan, 4 years ago

Needs tests: unset

comment:28 by Nick Stefan, 4 years ago

Owner: Nick Stefan removed
Status: assignednew

I unfortunately do not have bandwidth to work on this any longer. Hopefully someone else with more free time can take up the torch...

comment:29 by Mariusz Felisiak, 4 years ago

Patch needs improvement: set

comment:30 by Hannes Ljungberg, 4 years ago

Cc: Hannes Ljungberg added

comment:31 by Carlton Gibson, 3 years ago

Cc: Carlton Gibson added

comment:32 by Thibault, 3 years ago

Hello,

I'd like to contribute on this issue. I've read the whole ticket but i'm not sure to understand what's the current state of this patch. Looking at Nick's comment (https://github.com/django/django/pull/8661#issuecomment-624244307), some technical issues need to be answered or debated.

Thanks for your answer.

comment:33 by Thibault, 3 years ago

Owner: set to Thibault
Status: newassigned

comment:34 by Florian Demmer, 3 years ago

Cc: Florian Demmer added

comment:35 by Steven Mapes, 2 years ago

I hit an issue today which lead me back to this ticket. In MySQL 5.7 and 8 the default behaviour for FK constraints is RESTRICT which means that if you create your model with an on_delete=models.CASCADE, the SQL migration is created within the ON DELETE argument which means the table is created using the default constraint of RESTRICT.

This means that the database will not act as intended if the models are not deleted in the right order things break, but more over DELETE statements directly on the database will break.

This means Django should not really be considered compatible with MySQL 5.7 or 8+ and probably older versions.

Here's the MySQL docs - https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html#foreign-key-referential-actions
"RESTRICT: Rejects the delete or update operation for the parent table. Specifying RESTRICT (or NO ACTION) is the same as omitting the ON DELETE or ON UPDATE clause."

This means if you work on a project that requires actions to be taken at a database level you have to manually drop and recreate the foreign keys

comment:36 by Peter Thomassen, 2 years ago

Cc: Peter Thomassen added

comment:37 by Adrien Delessert, 2 years ago

Cc: Adrien Delessert added

comment:38 by Adam Johnson, 2 years ago

Cc: Adam Johnson added

comment:39 by raydeal, 2 years ago

Cc: raydeal added

comment:40 by John Speno, 2 years ago

Cc: John Speno added

comment:41 by eduards-amped, 2 years ago

Cc: eduards-amped added

comment:42 by David Wobrock, 20 months ago

Cc: David Wobrock added

comment:43 by bcail, 17 months ago

Cc: bcail added

comment:44 by Mariusz Felisiak, 12 months ago

Owner: Thibault removed
Status: assignednew

comment:45 by Akash Kumar Sen, 12 months ago

Owner: set to Akash Kumar Sen
Status: newassigned

comment:46 by Akash Kumar Sen, 12 months ago

Patch needs improvement: unset

comment:47 by Mariusz Felisiak, 12 months ago

Needs documentation: set
Patch needs improvement: set

comment:48 by Akash Kumar Sen, 12 months ago

in reply to:  48 comment:49 by Carlton Gibson, 12 months ago

Needs documentation: unset
Patch needs improvement: unset

Replying to Akash Kumar Sen:

Updated the patch ​https://github.com/django/django/pull/16851

Hi Akash — I'm assuming this means that you want another review here so, I've unchecked the Needs documentation and Patch needs improvement flags, so it will appear again on the list of patches needing review for you.

comment:50 by Mariusz Felisiak, 10 months ago

Needs documentation: set
Patch needs improvement: set

comment:51 by Akash Kumar Sen, 9 months ago

Needs documentation: unset
Patch needs improvement: unset
  • Fixed the typos in the documentation and resolved the suggestions given by Mariusz
  • Will work on integrating the DB_SET_DEFAULT next, as suggested by Nick here
  • But I am a not sure about the serializer fix, some review will be helpful here, or some suggestion over how to reproduce the issue.

comment:52 by lo0, 9 months ago

Cc: lo0 added

comment:53 by Mariusz Felisiak, 9 months ago

Needs tests: set
Patch needs improvement: set

comment:54 by Akash Kumar Sen, 8 months ago

Needs tests: unset
Patch needs improvement: unset
  • Added tests for migrations.

There are one I could not resolve after a lot of try :

Apparently MariaDB uses the InnoDB storage engine by default - I think that explains why the table create is failing there.
It makes me think maybe Django should not allow on_delete=DB_SET_DEFAULT for MySQL 🤔

As mentioned by Sarah Boyce in the GitHub comment, I also think we should disable on_delete=DB_SET_DEFAULT for MySQL, because according to https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html :

SET DEFAULT: This action is recognized by the MySQL parser, but both InnoDB and NDB reject table definitions containing ON DELETE SET DEFAULT or ON UPDATE SET DEFAULT clauses.

Last edited 8 months ago by Akash Kumar Sen (previous) (diff)

comment:55 by Lily Foote, 6 months ago

I think this is ready for review from a fellow now.

comment:56 by Mariusz Felisiak, 4 months ago

Patch needs improvement: set

comment:57 by Roman Donchenko, 3 months ago

Cc: Roman Donchenko added

comment:58 by Akash Kumar Sen, 2 months ago

Patch needs improvement: unset

comment:59 by Sarah Boyce, 3 weeks ago

Patch needs improvement: set

comment:60 by Sarah Boyce, 13 days ago

Patch needs improvement: unset

comment:61 by Sarah Boyce, 12 days ago

Patch needs improvement: set
Note: See TracTickets for help on using tickets.
Back to Top