Opened 4 years ago

Last modified 4 days ago

#21961 assigned New feature

Add support for database-level cascading options

Reported by: Christophe Pettus Owned by: Nick Stefan
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: emorley@…, andre@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Per a discussion on -developers, this ticket is to document this proposed new feature.

The general idea is to allow developers to specify that the database itself, rather than Django, should implement foreign key constraints where possible. The database can be considerably more efficient, and often can avoid locking situations, in a way that the Django backend can't.

The proposal is to add a models.CASCADE_DB, models.SET_NULL_DB, models.PROTECT_DB constants. These specify that the appropriate functionality is to be implemented in the database if the database supports it; otherwise, they are implemented in the Django core as now.

Some potential design issues with proposed solutions:

  1. It is not an error to specify the _DB version with a database that doesn't support it. Instead, you get the non-_DB version.
  2. The _DB version does not fire signals; this will need to be documented.
  3. If a model A references a model B using CASCADE_DB, but model B references model C using regular CASCADE, a deletion of A won't cascade all the way to C. This needs to be documented. Another possibility is to make this an error condition and have model validation fail, but that seems excessive.
  4. The _DB version is disallowed on generic foreign keys, because that way madness lies.
  5. The implicit foreign key created from child to parent in model inheritance is never the _DB version. This is a shame, but there are two issues:

a) Since it's created implicitly, there's no place to put it.
b) Even if we extended the mechanism to allow manual specification of the key, deleting the parent wouldn't automatically delete the child.

Change History (19)

comment:1 Changed 4 years ago by Marc Tamlyn

Triage Stage: UnreviewedAccepted

If 3 can be introspected for (which should be possible) we can at least implement a check for it in the new checks framework.

comment:2 Changed 4 years ago by Aymeric Augustin

  1. is tricky. My instinct would be to fail with an exception when the code requires something that cannot be achieved. However, I understand that pluggable apps may want to take advantage of database-level cascades, while still supporting less capable databases. Short of introducing a third value (CASCADE_DB_PROVIDED_YOU_ARE_USING_A_REAL_DATABASE), your proposal is probably the best solution.

A similar argument can be made for 3.

5 might be just another argument against MTI...

comment:3 Changed 4 years ago by Simon Charette

  1. It is possible to specify the key:
class Parent(models.Model):
    pass

class Child(Parent):
    parent = models.OneToOneField(Parent, parent_link=True)

comment:4 Changed 4 years ago by Anssi Kääriäinen

I think the description has parent and child deletion mixed. Deleting the parent will delete the child (there is a key from child to parent). The problem is child deletion. In Django deleting the child cascades to the parent row, too. But there is no column from parent to child you can cascade along if you do cascades in the DB. The real problematic case:

class Related(models.Model):
    pass

class Parent(models.Model):
    pass

class Child(Parent):
    related = models.ForeignKey(Related, on_delete=DB_CASCADE)

When you delete Related instance, Child instances pointing to it will be deleted by db cascade, but the parent instances will be left alone. If you instead use standard CASCADE, then the parent instances will be deleted, too. This is surprising behaviour. It can be documented, but erroring out would be IMO better.

comment:5 Changed 4 years ago by Shai Berger

1+2 => When using CASCADE_DB, signals will fire only on backends which do not support the feature.

Similar issues, of course, for 1+3, 1+5.

I find this result quite disturbing.

Alternative: When using CASCADE_DB on a backend where the database cannot implement it, Django tries to emulate it; this is not equivalent to CASCADE.

comment:6 Changed 3 years ago by Tim Graham

Version: 1.7-alpha-1master

comment:7 Changed 3 years ago by Alexander Schepanovski

For me it actually ok that signals won't be called and some deletion propagation could be broken. It's kind of .raw() and .extra() if you mess with database you should handle all these things.

My use case is using a database with another non-django app. So I can't rely on Djangos cascading.

comment:8 Changed 3 years ago by Anssi Kääriäinen

I won't oppose an approach where checks framework will warn about dangerous behavior.

For issue 1+2 mentioned by Shai in comment:5: we should disable signals when cascading along CASCADE_DB converted to normal CASCADE due to not having support for CASCADE by the backend.

comment:9 Changed 2 years ago by Carl Meyer

I think we should consider introducing this feature as a totally separate kwarg (on_delete_db) rather than conflating it with the existing Python-level on_delete. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.

Right now the contract of on_delete is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have an on_delete kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).

So under my proposal, if you want database-level cascade, you would specify on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.

For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of on_delete_db on a backend that can't support it should be an error.

comment:10 Changed 2 years ago by Tim Graham

Summary: ForeignKey.on_delete supports database-level cascading optionsAdd support for database-level cascading options

comment:11 Changed 2 years ago by Gustavo Narea

In case anyone is interested in a workaround for PostgreSQL, I created this simple plugin:
https://pypi.python.org/pypi/django-postgres-delete-cascade

That's meant as a temporary solution, until Django supports this out of the box.

comment:12 Changed 17 months ago by Ed Morley

Cc: emorley@… added

comment:13 Changed 11 months ago by André Cruz

Cc: andre@… added

comment:14 Changed 6 months ago by Nick Stefan

Owner: changed from nobody to Nick Stefan
Status: newassigned

comment:15 Changed 6 months ago by Nick Stefan

postgresql and mysql are fine. However, for SQLite to use ON DELETE, we would need a new OPTIONS flag. Something like:

settings.DATABASES.["default"]["OPTIONS"]["PRAGMA_foreign_keys"] = True

(which would then lead to the database wrapper running the correct SQL: https://sqlite.org/foreignkeys.html#fk_enable)

If I add the pragma flag, the test suite fails quite a few tests (50+). Therefore, SQLite PRAGMA foreign keys, seems to be a bigger and separate scope. It does have its own ticket here https://code.djangoproject.com/ticket/14204. So postgresql and mysql are it, for now.

comment:16 in reply to:  9 ; Changed 9 days ago by Dylan Young

A separate kwarg (on_delete_db) is a much better proposal:

1) It (together with adequate documentation) eliminates *ALL* of the ambiguities present in the current proposal.
2) When this kwarg is unspecified, it should default to the value of on_delete, ensuring as closely as possible consistency between raw database operations and Django operations.
3) Otherwise it leaves granular control up to the user: Don't need delete signals (postgres should be able to support post_delete, no?)? Use the configuration outlined by Carl. Need delete signals in your Django application, but still want sane deletion behaviour in other applications using your DB? Simply set on_delete. Need your other applications to have different deletion behaviour? Just reverse the configuration outlined by Carl... Or any number of options?
4) You could add some special flags that isolate the ORM-level deletion to specific DB-problematic cases (Generic Foreign Keys, Inheritance...), thus eliminating the overhead of python-space constraints in the majority of cases.
5) Maybe it also makes testing Django's ORM on_delete behaviour easier? By adding the ability to easily cross-reference with the Database's internal implementation... could be wrong on that, but with on_delete set to cascade, you would expect the final deletion call of the original object (after all the ORM work to delete related objects) to only delete a single instance right? If it deletes more than expected, the error in the ORMs deletion emulation is exposed.

An extra thought on Postgres:

It seems like it should be possible to support signals with DB-level constraints. Actually any DB that supports returning + transactions should be capable, no?

Replying to Carl Meyer:

I think we should consider introducing this feature as a totally separate kwarg (on_delete_db) rather than conflating it with the existing Python-level on_delete. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.

Right now the contract of on_delete is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have an on_delete kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).

So under my proposal, if you want database-level cascade, you would specify on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.

For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of on_delete_db on a backend that can't support it should be an error.

comment:17 in reply to:  16 Changed 9 days ago by Nick Stefan

Hi Dylan.

I think your thoughts do make logical sense in response to Carl. However, that proposal was written a long time ago (it was circa 2014). I think most of that "ambiguity" discussed then comes from 2014 DBs not supporting ON DELETE CASCADE. That lack of support then required the need to have automatic fall backs.

In 2017, all of the supported databases should support ON DELETE CASCADE, so a single on_delete kwarg still makes sense to me.

I guess there is "ambiguity" regarding signals etc, but I don't think that should be the case. DO_NOTHING has been a value for a long time. DB_CASCADE is really just DO_NOTHING plus some more SQL, so it should be understood that there can't be signals.

Of course, that's just my opinion, and I admit that it might be unpopular based on how my PR has fared lol :).

I opened a PR back in the summer proving the concept, and added pre check tests to prevent hairy situations like generic foreign keys, concrete model inheritance, etc: https://github.com/django/django/pull/8661 .

--Nick

Replying to Dylan Young:

A separate kwarg (on_delete_db) is a much better proposal:

1) It (together with adequate documentation) eliminates *ALL* of the ambiguities present in the current proposal.
2) When this kwarg is unspecified, it should default to the value of on_delete, ensuring as closely as possible consistency between raw database operations and Django operations.
3) Otherwise it leaves granular control up to the user: Don't need delete signals (postgres should be able to support post_delete, no?)? Use the configuration outlined by Carl. Need delete signals in your Django application, but still want sane deletion behaviour in other applications using your DB? Simply set on_delete. Need your other applications to have different deletion behaviour? Just reverse the configuration outlined by Carl... Or any number of options?
4) You could add some special flags that isolate the ORM-level deletion to specific DB-problematic cases (Generic Foreign Keys, Inheritance...), thus eliminating the overhead of python-space constraints in the majority of cases.
5) Maybe it also makes testing Django's ORM on_delete behaviour easier? By adding the ability to easily cross-reference with the Database's internal implementation... could be wrong on that, but with on_delete set to cascade, you would expect the final deletion call of the original object (after all the ORM work to delete related objects) to only delete a single instance right? If it deletes more than expected, the error in the ORMs deletion emulation is exposed.

An extra thought on Postgres:

It seems like it should be possible to support signals with DB-level constraints. Actually any DB that supports returning + transactions should be capable, no?

Replying to Carl Meyer:

I think we should consider introducing this feature as a totally separate kwarg (on_delete_db) rather than conflating it with the existing Python-level on_delete. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.

Right now the contract of on_delete is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have an on_delete kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).

So under my proposal, if you want database-level cascade, you would specify on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.

For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of on_delete_db on a backend that can't support it should be an error.

comment:18 Changed 4 days ago by Dylan Young

I see what you're saying.

I still think there is value to the in-app deletion policy (without the need to "roll your own") alongside a DB-enforced deletion policy. For example, a very sensible scheme would be to have both models.PROTECT_DB and models.CASCADE enabled, whereas (correct me if I'm mistaken) your scheme *only* allows allow DO_NOTHING in combination with any _DB setting, which has the nasty side-effect of suppressing signals.

This could still be done in your scheme of course (and can be done now), but it involves either a custom migration... or a bit of hacking of the migration system, and this creates a disparity between model definition and model implementation in the DB, not to mention the extra coding effort.

A context manager for bypassing the ORM deletion policy would also be a nicety (this already exists I think, but I can't recall 100%).

Alternatively a syntax that allows ORing the *_DB flags with the standard ORM on_delete flags could be used, but I think a separate kwarg is cleaner.

I'm curious too if there's any significant performance gain to be had from simulating the delete (without actually deleting) in the ORM in order to correctly fire the signals, but letting the DB deal with the actual deletion. This would probably mean that the signal ordering would be slightly different (all pre-delete signals would fire, followed by all post-delete signals).

In any case, this is actually a huge problem for Django where a small bulk deletion (say 100 rows) can take hours, and I would be thankful for any resolution to the problem.

comment:19 Changed 4 days ago by Nick Stefan

@Dylan

or example, a very sensible scheme would be to have both models.PROTECT_DB and models.CASCADE enabled

I do understand this desire. The existing foreign key constraints that get added for models.CASCADE, only help you when you later go to edit a row with one of those not valid foreign keys. It wont enforce / protect you when models.CASCADE doesn't get all of the rows it should have. I hadn't considered that there could be a desire for DB_DELETE that is completely outside performance desires.

Is that a fair characterization?

Note: See TracTickets for help on using tickets.
Back to Top