Opened 3 years ago
Closed 3 years ago
#33928 closed Cleanup/optimization (fixed)
Performance issues with `on_delete=models.SET_NULL` on large tables
| Reported by: | Renan GEHAN | Owned by: | Simon Charette | 
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 3.0 | 
| Severity: | Normal | Keywords: | |
| Cc: | David Wobrock | Triage Stage: | Ready for checkin | 
| Has patch: | yes | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description
Hello,
I have the following models configuration:
- Parentmodel
- Childmodel, with a- parent_idforeign key to a- Parentmodel, set with- on_delete=models.SET_NULL
Each Parent can have a lot of children, in my case roughly 30k.
I'm starting to encounter performance issues that make my jobs timeout, because the SQL queries simply timeout.
I've enabled query logging, and noticed something weird (that is certainly that way on purpose, but I don't understand why).
# Select the parent
SELECT * FROM "parent" WHERE "parent"."id" = 'parent123';
# Select all children
SELECT * FROM "children" WHERE "children"."parent_id" IN ('parent123');
# Update all children `parent_id` column to `NULL`
UPDATE "children" SET "parent_id" = NULL WHERE "children"."id" IN ('child1', 'child2', 'child3', ..., 'child30000');
# Finally delete the parent
DELETE FROM "parent" WHERE "parent"."id" IN ('parent123');
I would have expected the update condition to simply be WHERE "children"."parent_id" = 'parent123', but for some reason it isn't.
In the meantime, I'll switch to on_delete=models.CASCADE, which in my case does the trick, but I was curious about the reason why this happens in the first place.
Thanks in advance
Change History (8)
comment:1 by , 3 years ago
| Triage Stage: | Unreviewed → Accepted | 
|---|---|
| Type: | Uncategorized → Cleanup/optimization | 
comment:3 by , 3 years ago
| Owner: | changed from to | 
|---|---|
| Status: | new → assigned | 
comment:4 by , 3 years ago
| Cc: | added | 
|---|
comment:5 by , 3 years ago
| Patch needs improvement: | set | 
|---|
comment:6 by , 3 years ago
| Patch needs improvement: | unset | 
|---|---|
| Triage Stage: | Accepted → Ready for checkin | 
You are right that is an opportunity for an optimization when
SET,SET_DEFAULT, orSET_NULLis used but I wonder if it's worth doing givendb_on_deletesupport (see #21961) make things even better for this use case.This is likely the case because the collector is able to take the fast delete route and avoid object fetching entirely.
If you want to give a shot at a patch you'll want to have a look at
Collector.collectand have it skip collection entirely when dealing withSETand friends likely by adding a branch that turns them into fast updates. One branch that worries me is the post-deletion assignment of values to in-memory instances but I can't understand why this is even necessary given all the instances that are collected for field updates are never make their way out of the collector so I would expect it to be entirely unnecessary at least alldeleteanddelete_regresstests pass if I entirely remove itdjango/db/models/deletion.py
# update collected instancesfor instances_for_fieldvalues in self.field_updates.values():for (field, value), instances in instances_for_fieldvalues.items():for obj in instances:setattr(obj, field.attname, value)You'll want to make sure to avoid breaking the admin's collector subclass used to display deletion confirmation pages but from a quick look it doesn't seem to care about field updates.
Tentatively accepting but I think we should revisit when #21961 lands.