Opened 2 years ago
Closed 2 years ago
#33928 closed Cleanup/optimization (fixed)
Performance issues with `on_delete=models.SET_NULL` on large tables
Reported by: | Renan GEHAN | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.0 |
Severity: | Normal | Keywords: | |
Cc: | David Wobrock | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello,
I have the following models configuration:
Parent
modelChild
model, with aparent_id
foreign key to aParent
model, set withon_delete=models.SET_NULL
Each Parent
can have a lot of children, in my case roughly 30k.
I'm starting to encounter performance issues that make my jobs timeout, because the SQL queries simply timeout.
I've enabled query logging, and noticed something weird (that is certainly that way on purpose, but I don't understand why).
# Select the parent SELECT * FROM "parent" WHERE "parent"."id" = 'parent123'; # Select all children SELECT * FROM "children" WHERE "children"."parent_id" IN ('parent123'); # Update all children `parent_id` column to `NULL` UPDATE "children" SET "parent_id" = NULL WHERE "children"."id" IN ('child1', 'child2', 'child3', ..., 'child30000'); # Finally delete the parent DELETE FROM "parent" WHERE "parent"."id" IN ('parent123');
I would have expected the update condition to simply be WHERE "children"."parent_id" = 'parent123'
, but for some reason it isn't.
In the meantime, I'll switch to on_delete=models.CASCADE
, which in my case does the trick, but I was curious about the reason why this happens in the first place.
Thanks in advance
Change History (8)
comment:1 by , 2 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Cleanup/optimization |
comment:3 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 2 years ago
Cc: | added |
---|
comment:5 by , 2 years ago
Patch needs improvement: | set |
---|
comment:6 by , 2 years ago
Patch needs improvement: | unset |
---|---|
Triage Stage: | Accepted → Ready for checkin |
You are right that is an opportunity for an optimization when
SET
,SET_DEFAULT
, orSET_NULL
is used but I wonder if it's worth doing givendb_on_delete
support (see #21961) make things even better for this use case.This is likely the case because the collector is able to take the fast delete route and avoid object fetching entirely.
If you want to give a shot at a patch you'll want to have a look at
Collector.collect
and have it skip collection entirely when dealing withSET
and friends likely by adding a branch that turns them into fast updates. One branch that worries me is the post-deletion assignment of values to in-memory instances but I can't understand why this is even necessary given all the instances that are collected for field updates are never make their way out of the collector so I would expect it to be entirely unnecessary at least alldelete
anddelete_regress
tests pass if I entirely remove itdjango/db/models/deletion.py
# update collected instancesfor instances_for_fieldvalues in self.field_updates.values():for (field, value), instances in instances_for_fieldvalues.items():for obj in instances:setattr(obj, field.attname, value)You'll want to make sure to avoid breaking the admin's collector subclass used to display deletion confirmation pages but from a quick look it doesn't seem to care about field updates.
Tentatively accepting but I think we should revisit when #21961 lands. In the mean time if you want to give a shot at implementing this optimization