Opened 11 years ago
Last modified 4 weeks ago
#21961 assigned New feature
Add support for database-level cascading options
Reported by: | Christophe Pettus | Owned by: | Akash Kumar Sen |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | emorley@…, andre@…, Adam Wróbel, Nick Stefan, Hannes Ljungberg, Carlton Gibson, Florian Demmer, Peter Thomassen, Adrien Delessert, Adam Johnson, raydeal, John Speno, eduards-amped, David Wobrock, bcail, lo0, Roman Donchenko, Ryan Hiebert, Mariusz Felisiak | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Per a discussion on -developers, this ticket is to document this proposed new feature.
The general idea is to allow developers to specify that the database itself, rather than Django, should implement foreign key constraints where possible. The database can be considerably more efficient, and often can avoid locking situations, in a way that the Django backend can't.
The proposal is to add a models.CASCADE_DB, models.SET_NULL_DB, models.PROTECT_DB constants. These specify that the appropriate functionality is to be implemented in the database if the database supports it; otherwise, they are implemented in the Django core as now.
Some potential design issues with proposed solutions:
- It is not an error to specify the _DB version with a database that doesn't support it. Instead, you get the non-_DB version.
- The _DB version does not fire signals; this will need to be documented.
- If a model A references a model B using CASCADE_DB, but model B references model C using regular CASCADE, a deletion of A won't cascade all the way to C. This needs to be documented. Another possibility is to make this an error condition and have model validation fail, but that seems excessive.
- The _DB version is disallowed on generic foreign keys, because that way madness lies.
- The implicit foreign key created from child to parent in model inheritance is never the _DB version. This is a shame, but there are two issues:
a) Since it's created implicitly, there's no place to put it.
b) Even if we extended the mechanism to allow manual specification of the key, deleting the parent wouldn't automatically delete the child.
Change History (63)
comment:1 by , 11 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 11 years ago
- is tricky. My instinct would be to fail with an exception when the code requires something that cannot be achieved. However, I understand that pluggable apps may want to take advantage of database-level cascades, while still supporting less capable databases. Short of introducing a third value (CASCADE_DB_PROVIDED_YOU_ARE_USING_A_REAL_DATABASE), your proposal is probably the best solution.
A similar argument can be made for 3.
5 might be just another argument against MTI...
comment:3 by , 11 years ago
- It is possible to specify the key:
class Parent(models.Model): pass class Child(Parent): parent = models.OneToOneField(Parent, parent_link=True)
comment:4 by , 11 years ago
I think the description has parent and child deletion mixed. Deleting the parent will delete the child (there is a key from child to parent). The problem is child deletion. In Django deleting the child cascades to the parent row, too. But there is no column from parent to child you can cascade along if you do cascades in the DB. The real problematic case:
class Related(models.Model): pass class Parent(models.Model): pass class Child(Parent): related = models.ForeignKey(Related, on_delete=DB_CASCADE)
When you delete Related instance, Child instances pointing to it will be deleted by db cascade, but the parent instances will be left alone. If you instead use standard CASCADE, then the parent instances will be deleted, too. This is surprising behaviour. It can be documented, but erroring out would be IMO better.
comment:5 by , 11 years ago
1+2 => When using CASCADE_DB, signals will fire only on backends which do not support the feature.
Similar issues, of course, for 1+3, 1+5.
I find this result quite disturbing.
Alternative: When using CASCADE_DB on a backend where the database cannot implement it, Django tries to emulate it; this is not equivalent to CASCADE.
comment:6 by , 10 years ago
Version: | 1.7-alpha-1 → master |
---|
comment:7 by , 10 years ago
For me it actually ok that signals won't be called and some deletion propagation could be broken. It's kind of .raw()
and .extra()
if you mess with database you should handle all these things.
My use case is using a database with another non-django app. So I can't rely on Djangos cascading.
comment:8 by , 10 years ago
I won't oppose an approach where checks framework will warn about dangerous behavior.
For issue 1+2 mentioned by Shai in comment:5: we should disable signals when cascading along CASCADE_DB converted to normal CASCADE due to not having support for CASCADE by the backend.
follow-up: 16 comment:9 by , 9 years ago
I think we should consider introducing this feature as a totally separate kwarg (on_delete_db
) rather than conflating it with the existing Python-level on_delete
. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.
Right now the contract of on_delete
is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have an on_delete
kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).
So under my proposal, if you want database-level cascade, you would specify on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE
or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.
For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of on_delete_db
on a backend that can't support it should be an error.
comment:10 by , 9 years ago
Summary: | ForeignKey.on_delete supports database-level cascading options → Add support for database-level cascading options |
---|
comment:11 by , 9 years ago
In case anyone is interested in a workaround for PostgreSQL, I created this simple plugin:
https://pypi.python.org/pypi/django-postgres-delete-cascade
That's meant as a temporary solution, until Django supports this out of the box.
comment:12 by , 8 years ago
Cc: | added |
---|
comment:13 by , 8 years ago
Cc: | added |
---|
comment:14 by , 7 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:15 by , 7 years ago
postgresql and mysql are fine. However, for SQLite to use ON DELETE, we would need a new OPTIONS flag. Something like:
settings.DATABASES.["default"]["OPTIONS"]["PRAGMA_foreign_keys"] = True
(which would then lead to the database wrapper running the correct SQL: https://sqlite.org/foreignkeys.html#fk_enable)
If I add the pragma flag, the test suite fails quite a few tests (50+). Therefore, SQLite PRAGMA foreign keys, seems to be a bigger and separate scope. It does have its own ticket here https://code.djangoproject.com/ticket/14204. So postgresql and mysql are it, for now.
follow-up: 17 comment:16 by , 7 years ago
A separate kwarg (on_delete_db
) is a much better proposal:
1) It (together with adequate documentation) eliminates *ALL* of the ambiguities present in the current proposal.
2) When this kwarg is unspecified, it should default to the value of on_delete
, ensuring as closely as possible consistency between raw database operations and Django operations.
3) Otherwise it leaves granular control up to the user: Don't need delete signals (postgres should be able to support post_delete, no?)? Use the configuration outlined by Carl. Need delete signals in your Django application, but still want sane deletion behaviour in other applications using your DB? Simply set on_delete
. Need your other applications to have different deletion behaviour? Just reverse the configuration outlined by Carl... Or any number of options?
4) You could add some special flags that isolate the ORM-level deletion to specific DB-problematic cases (Generic Foreign Keys, Inheritance...), thus eliminating the overhead of python-space constraints in the majority of cases.
5) Maybe it also makes testing Django's ORM on_delete behaviour easier? By adding the ability to easily cross-reference with the Database's internal implementation... could be wrong on that, but with on_delete
set to cascade, you would expect the final deletion call of the original object (after all the ORM work to delete related objects) to only delete a single instance right? If it deletes more than expected, the error in the ORMs deletion emulation is exposed.
An extra thought on Postgres:
It seems like it should be possible to support signals with DB-level constraints. Actually any DB that supports returning
+ transactions should be capable, no?
Replying to Carl Meyer:
I think we should consider introducing this feature as a totally separate kwarg (
on_delete_db
) rather than conflating it with the existing Python-levelon_delete
. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.
Right now the contract of
on_delete
is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have anon_delete
kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).
So under my proposal, if you want database-level cascade, you would specify
on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE
or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.
For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of
on_delete_db
on a backend that can't support it should be an error.
comment:17 by , 7 years ago
Hi Dylan.
I think your thoughts do make logical sense in response to Carl. However, that proposal was written a long time ago (it was circa 2014). I think most of that "ambiguity" discussed then comes from 2014 DBs not supporting ON DELETE CASCADE. That lack of support then required the need to have automatic fall backs.
In 2017, all of the supported databases should support ON DELETE CASCADE, so a single on_delete kwarg still makes sense to me.
I guess there is "ambiguity" regarding signals etc, but I don't think that should be the case. DO_NOTHING has been a value for a long time. DB_CASCADE is really just DO_NOTHING plus some more SQL, so it should be understood that there can't be signals.
Of course, that's just my opinion, and I admit that it might be unpopular based on how my PR has fared lol :).
I opened a PR back in the summer proving the concept, and added pre check tests to prevent hairy situations like generic foreign keys, concrete model inheritance, etc: https://github.com/django/django/pull/8661 .
--Nick
Replying to Dylan Young:
A separate kwarg (
on_delete_db
) is a much better proposal:
1) It (together with adequate documentation) eliminates *ALL* of the ambiguities present in the current proposal.
2) When this kwarg is unspecified, it should default to the value ofon_delete
, ensuring as closely as possible consistency between raw database operations and Django operations.
3) Otherwise it leaves granular control up to the user: Don't need delete signals (postgres should be able to support post_delete, no?)? Use the configuration outlined by Carl. Need delete signals in your Django application, but still want sane deletion behaviour in other applications using your DB? Simply seton_delete
. Need your other applications to have different deletion behaviour? Just reverse the configuration outlined by Carl... Or any number of options?
4) You could add some special flags that isolate the ORM-level deletion to specific DB-problematic cases (Generic Foreign Keys, Inheritance...), thus eliminating the overhead of python-space constraints in the majority of cases.
5) Maybe it also makes testing Django's ORM on_delete behaviour easier? By adding the ability to easily cross-reference with the Database's internal implementation... could be wrong on that, but withon_delete
set to cascade, you would expect the final deletion call of the original object (after all the ORM work to delete related objects) to only delete a single instance right? If it deletes more than expected, the error in the ORMs deletion emulation is exposed.
An extra thought on Postgres:
It seems like it should be possible to support signals with DB-level constraints. Actually any DB that supports
returning
+ transactions should be capable, no?
Replying to Carl Meyer:
I think we should consider introducing this feature as a totally separate kwarg (
on_delete_db
) rather than conflating it with the existing Python-levelon_delete
. The various edge cases and implicit fallbacks in the existing proposal worry me, and I think it would be better to be more explicit and clear about what is happening on the database level vs the Python level.
Right now the contract of
on_delete
is very simple: it takes a Python callable which will be called when a delete cascades. There is nothing at all special about the built-in callables, they are just conveniences for common cases. The current API proposal complicates that contract dramatically: now you have anon_delete
kwarg which sometimes accepts Python callables but can also accept magical constants which do something entirely different (and also have an implicit fallback relationship with one of the built-in callables).
So under my proposal, if you want database-level cascade, you would specify
on_delete=models.DO_NOTHING, on_delete_db=models.DB_CASCADE
or similar. Sure this is a bit more verbose, but I think that's worth it for the gains in clarity and simplicity.
For similar reasons, I feel pretty strongly that we should NOT automatically fallback from db-cascade to non-db-cascade depending on the backend capabilities. It introduces too many differences in behavior depending on db backend. Trying to claim to provide cross-db portability when we can't really do so consistently is worse than simply not claiming to provide it at all. DB-level cascade (like many other "advanced" db-level features) is something you should only take advantage of when you know your code will be running on a database that supports it, full stop. Use of
on_delete_db
on a backend that can't support it should be an error.
comment:18 by , 7 years ago
I see what you're saying.
I still think there is value to the in-app deletion policy (without the need to "roll your own") alongside a DB-enforced deletion policy. For example, a very sensible scheme would be to have both models.PROTECT_DB and models.CASCADE enabled, whereas (correct me if I'm mistaken) your scheme *only* allows allow DO_NOTHING in combination with any _DB
setting, which has the nasty side-effect of suppressing signals.
This could still be done in your scheme of course (and can be done now), but it involves either a custom migration... or a bit of hacking of the migration system, and this creates a disparity between model definition and model implementation in the DB, not to mention the extra coding effort.
A context manager for bypassing the ORM deletion policy would also be a nicety (this already exists I think, but I can't recall 100%).
Alternatively a syntax that allows ORing the *_DB
flags with the standard ORM on_delete
flags could be used, but I think a separate kwarg is cleaner.
I'm curious too if there's any significant performance gain to be had from simulating the delete (without actually deleting) in the ORM in order to correctly fire the signals, but letting the DB deal with the actual deletion. This would probably mean that the signal ordering would be slightly different (all pre-delete signals would fire, followed by all post-delete signals).
In any case, this is actually a huge problem for Django where a small bulk deletion (say 100 rows) can take hours, and I would be thankful for any resolution to the problem.
comment:19 by , 7 years ago
@Dylan
or example, a very sensible scheme would be to have both models.PROTECT_DB and models.CASCADE enabled
I do understand this desire. The existing foreign key constraints that get added for models.CASCADE, only help you when you later go to edit a row with one of those not valid foreign keys. It wont enforce / protect you when models.CASCADE doesn't get all of the rows it should have. I hadn't considered that there could be a desire for DB_DELETE that is completely outside performance desires.
Is that a fair characterization?
comment:20 by , 6 years ago
Just an idea, having two kwargs (on_delete/on_delete_db) is not entirely different from having one kwarg and two sets of constants (SET_NULL/SET_NULL_DB) -- we can always combine those constants like this: on_delete=SET_NULL|SET_NULL_DB and have the described behavior without adding a separate kwarg. Assuming it's appropriate in this case (although it is the way Q objects are used, for example) and that having such combinations is desirable at all (I could think of some cases but I'm not sure I can remember any such in any of my recent projects).
comment:21 by , 6 years ago
Cc: | added |
---|
follow-up: 24 comment:23 by , 5 years ago
Patch needs improvement: | set |
---|
comment:24 by , 5 years ago
Replying to felixxm:
I noticed the patch is still set to "needs improvement". We've been going through code review process in github; do I leave that flag here as true after the code review follow ups? Or do I change that flag once code review fixes go through? Thanks!
comment:25 by , 5 years ago
Patch needs improvement: | unset |
---|
Code review seems to be complete, so unsetting "needs improvement".
comment:26 by , 5 years ago
Needs tests: | set |
---|
comment:27 by , 5 years ago
Needs tests: | unset |
---|
comment:28 by , 5 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
I unfortunately do not have bandwidth to work on this any longer. Hopefully someone else with more free time can take up the torch...
comment:29 by , 5 years ago
Patch needs improvement: | set |
---|
comment:30 by , 4 years ago
Cc: | added |
---|
comment:31 by , 4 years ago
Cc: | added |
---|
comment:32 by , 3 years ago
Hello,
I'd like to contribute on this issue. I've read the whole ticket but i'm not sure to understand what's the current state of this patch. Looking at Nick's comment (https://github.com/django/django/pull/8661#issuecomment-624244307), some technical issues need to be answered or debated.
Thanks for your answer.
comment:33 by , 3 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:34 by , 3 years ago
Cc: | added |
---|
comment:35 by , 3 years ago
I hit an issue today which lead me back to this ticket. In MySQL 5.7 and 8 the default behaviour for FK constraints is RESTRICT which means that if you create your model with an on_delete=models.CASCADE, the SQL migration is created within the ON DELETE argument which means the table is created using the default constraint of RESTRICT.
This means that the database will not act as intended if the models are not deleted in the right order things break, but more over DELETE statements directly on the database will break.
This means Django should not really be considered compatible with MySQL 5.7 or 8+ and probably older versions.
Here's the MySQL docs - https://dev.mysql.com/doc/refman/8.0/en/create-table-foreign-keys.html#foreign-key-referential-actions
"RESTRICT: Rejects the delete or update operation for the parent table. Specifying RESTRICT (or NO ACTION) is the same as omitting the ON DELETE or ON UPDATE clause."
This means if you work on a project that requires actions to be taken at a database level you have to manually drop and recreate the foreign keys
comment:36 by , 3 years ago
Cc: | added |
---|
comment:37 by , 3 years ago
Cc: | added |
---|
comment:38 by , 3 years ago
Cc: | added |
---|
comment:39 by , 3 years ago
Cc: | added |
---|
comment:40 by , 3 years ago
Cc: | added |
---|
comment:41 by , 3 years ago
Cc: | added |
---|
comment:42 by , 2 years ago
Cc: | added |
---|
comment:43 by , 2 years ago
Cc: | added |
---|
comment:44 by , 20 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:45 by , 19 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:46 by , 19 months ago
Patch needs improvement: | unset |
---|
Created a new patch : https://github.com/django/django/pull/16851
comment:47 by , 19 months ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
follow-up: 49 comment:48 by , 19 months ago
Updated the patch https://github.com/django/django/pull/16851
comment:49 by , 19 months ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
Replying to Akash Kumar Sen:
Updated the patch https://github.com/django/django/pull/16851
Hi Akash — I'm assuming this means that you want another review here so, I've unchecked the Needs documentation and Patch needs improvement flags, so it will appear again on the list of patches needing review for you.
comment:50 by , 18 months ago
Needs documentation: | set |
---|---|
Patch needs improvement: | set |
comment:51 by , 17 months ago
Needs documentation: | unset |
---|---|
Patch needs improvement: | unset |
- Fixed the typos in the documentation and resolved the suggestions given by Mariusz
- Will work on integrating the
DB_SET_DEFAULT
next, as suggested by Nick here - But I am a not sure about the serializer fix, some review will be helpful here, or some suggestion over how to reproduce the issue.
comment:52 by , 16 months ago
Cc: | added |
---|
comment:53 by , 16 months ago
Needs tests: | set |
---|---|
Patch needs improvement: | set |
comment:54 by , 16 months ago
Needs tests: | unset |
---|---|
Patch needs improvement: | unset |
- Added tests for migrations.
There are two issues I could not resolve after a lot of try :
- MySQL specific error in the test
delete.tests.DatabaseLevelOnDeleteTests.test_foreign_key_db_default
- I am not sure how to test the serializer.
comment:56 by , 12 months ago
Patch needs improvement: | set |
---|
comment:57 by , 10 months ago
Cc: | added |
---|
comment:58 by , 10 months ago
Patch needs improvement: | unset |
---|
comment:59 by , 8 months ago
Patch needs improvement: | set |
---|
comment:60 by , 8 months ago
Patch needs improvement: | unset |
---|
comment:61 by , 8 months ago
Patch needs improvement: | set |
---|
comment:62 by , 7 months ago
Cc: | added |
---|
comment:63 by , 4 weeks ago
Cc: | added |
---|
If 3 can be introspected for (which should be possible) we can at least implement a check for it in the new checks framework.