Opened 8 years ago

Last modified 11 months ago

#9519 new New feature

Add QuerySet.bulk_delete() that issues only a single SQL query

Reported by: Joey Wilhelm Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: database, queryset, delete
Cc: Tobias McNulty, kmike84@…, dev@…, kkumler, denilsonsa@…, Zach Borboa Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no



from django.db import models

class MyModel(models.Model):
    my_id = models.IntegerField(primary_key=True)
    col1 = models.IntegerField()
    col2 = models.CharField(max_length=1)
    col3 = models.TextField()
    class Meta:
        unique_together = ('my_id', 'col1', 'col2') # "Fake" a multi-part primary key

# This works for creating all 3, as force_insert is used in create()
MyModel.objects.create(my_id=1, col1=5, col2='a', col3='foo')
MyModel.objects.create(my_id=1, col1=5, col2='b', col3='bar')
MyModel.objects.create(my_id=1, col1=10, col2='a', col3='baz')

MyModel.objects.filter(my_id=1, col1=5).delete()

This deletes all of the objects created above, since deletion is done only based on This is fine, except when you are using multi-part primary keys and emulating that pkey as suggested in #373.

In my opinion, the delete operation should be executed with respect to the filter(), extra(), etc which were previously performed on the QuerySet.

This is, in a way, a part of ticket #373, but I believe it can be viewed as a completely separate issue. In some cases, you do want to delete based on columns other than the primary key. Why build a list of all of those primary keys when you've already specified all of the qualifiers in a filter()?

Change History (30)

comment:1 Changed 8 years ago by Jacob

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset
Triage Stage: UnreviewedDesign decision needed

comment:2 Changed 7 years ago by Tobias McNulty

Cc: Tobias McNulty added

This is an issue for me as well. Imagine the cycle:

1) SELECT 100 of the objects you want to delete (this can take awhile depending on the WHERE clause)

2) DELETE those objects by ID

Now imagine it repeated 10000 times. I'm trying to delete millions of rows from a table, so the current way .delete() works is completely unusable.

IMHO .delete should just use the FROM table, JOINs (if any), and WHERE clause from the queryset itself.

comment:3 Changed 7 years ago by Tobias McNulty

Owner: changed from nobody to Tobias McNulty
Status: newassigned

comment:4 Changed 6 years ago by Tobias McNulty

Summary: QuerySet.delete() should operate with respect to previous query operationsQuerySet.delete() should issue only a single SQL query

comment:5 Changed 6 years ago by Craig de Stigter

+1 for this. The way it is currently implemented, all the objects that are to be deleted are pulled into memory, which is a nightmare for >10000 objects.

This could be implemented as QuerySet.delete_raw() or QuerySet.delete(raw=True) without breaking the existing API.

comment:6 Changed 6 years ago by erikrose

Me too.

comment:7 Changed 5 years ago by Luke Plant

Severity: Normal
Type: New feature

comment:8 Changed 5 years ago by Nathan Vander Wilt

Easy pickings: unset
UI/UX: unset


The docs led me to believe Django would bulk delete using DELETE...WHERE instead of having to stream every column from a bazillion rows.

comment:9 Changed 5 years ago by Carl Meyer

Summary: QuerySet.delete() should issue only a single SQL queryAdd QuerySet.bulk_delete() that issues only a single SQL query
Triage Stage: Design decision neededAccepted
Version: 1.0SVN

.delete() has to be implemented the way it is in order to support cascade-deletion. If Django didn't do cascade-deletion in the ORM it would be much more difficult to write cross-database-compatible ORM code. So the implementation of .delete() is not going to change (it does do the cascade more efficiently now, since #7539 was fixed, than it did when this ticket was filed).

The issue reported originally here (that deletion doesn't work with duplicate PKs) is a non-issue. Django's ORM assumes PKs are unique, and the database will enforce that unless your database is broken or you've manually modified the tables. That's not going to change, regardless of what hacky workarounds were suggested on #373.

All that said, I'd be open to a new .bulk_delete() method (parallel to the .bulk_create() method that was recently added) that would do deletion as requested here. It would have a number of caveats, primarily that it wouldn't do FK cascades, so you'd have to have your database set up correctly to handle that natively (ON DELETE CASCADE or whatnot) or you might get integrity errors.

natevw, if you think the documentation for the current .delete() method could better explain its limitations, that's a separate issue - please file a new ticket for it, ideally with a patch to the docs.

comment:10 Changed 5 years ago by Carl Meyer

After some discussion with Alex in IRC: the deletion code can reliably tell whether cascade is needed or not - if you have no FKs pointing to the model, or all of them are set to on_delete=DO_NOTHING, you don't need cascade. If you do, there's no point in giving you an easy way to shoot yourself in the foot with an IntegrityError. So however we spell the "bulk delete" option, it should detect whether a bulk delete is actually possible (i.e. no cascade needed); if it's not possible, it should either raise an error or fall back to the current deletion approach.

Between those two, I lean towards raising an error. The conditions that determine whether bulk delete is possible are deterministic and in the programmer's full control: it depends only on the models, not on the state of the database or anything else. So requesting a bulk delete on a model that needs to cascade deletes is simply an error, and it should fail fast so you can just fix your code to not request the bulk delete. That way you can know that if you have a working call to bulk delete, it is actually doing a bulk delete (else it would be failing).

We can't just quietly "do the right thing" under the covers because there is one unavoidable semantic difference: the pre_delete and post_delete signals. We currently send them for all deleted instances, and "bulk delete" will have no way of doing that. That will just have to be documented as a limitation of the bulk delete.

comment:11 Changed 5 years ago by Tobias McNulty

Owner: changed from Tobias McNulty to nobody
Status: assignednew

comment:12 Changed 5 years ago by Mikhail Korobov

Cc: kmike84@… added

comment:13 Changed 5 years ago by Mikhail Korobov

Probably incomplete implementation/workaround that works for me:

from django.db import transaction
from django.db.models.sql.subqueries import DeleteQuery

def truncate_queryset(qs):
    Deletes all records matched by queryset using

        DELETE from table WHERE <condition>

    query without fetching PK values for all items in original queryset.

    delete_query = qs.query.clone(DeleteQuery)

    # transaction management code is copied from QuerySet.update
    if not transaction.is_managed(using=qs.db):
        forced_managed = True
        forced_managed = False
        if forced_managed:
        if forced_managed:

comment:14 Changed 4 years ago by Brillgen Developers

Cc: dev@… added

comment:15 Changed 4 years ago by Anssi Kääriäinen

Note that a regular .delete() can avoid the fetching if there are no cascades, and there are no signals. We currently have no API to ask if somebody is listening for a signal, but we could easily add one - pre_delete.has_listeners(SomeModel) for example. This way we could automatically do the right thing in .delete() without need for a new operation. And, we could optimize deletion of cascades, too.

If there are signals but we want to skip sending those, we could just add a .delete(signals=False) option for that. I am -1 on ignoring cascades. This will lead to integrity errors. Or worse, there are DBs which do not enforce foreign keys.

comment:16 Changed 4 years ago by kkumler

Cc: kkumler added

comment:17 Changed 4 years ago by Anssi Kääriäinen

The idea in comment:15 has been implemented - is there still need for separate bulk_delete()?

comment:18 in reply to:  17 Changed 4 years ago by Preston Holmes

Replying to akaariai:

The idea in comment:15 has been implemented - is there still need for separate bulk_delete()?

I agree about ignoring cascades being a non-starter - if you need that level of manipulation, drop to SQL.

The question is whether there are times you have a non-cascading model, you have signals, but you want to just delete a batch of objects. This is analogous to update method not firing signals.

whether this is bulk_delete or delete(signals=False) might be a minor point, I'd lean slightly toward a separate method as being cleaner to document as a special case, and to not introduce the concept of signals=False as being something that people would request be a general pattern for other methods.

If we wanted to offer some disabling signals more generally, perhaps we could look into a context manager for that.

comment:19 Changed 4 years ago by Anssi Kääriäinen

API idea: How about qs.values().delete() doing a no-signals delete? We could also implement the cascades by fetching the PK values to be deleted. This should make .values().delete() faster and less memory hungry, yet cascades are implemented.

To me this API seems correct - values() is doing "fetch the raw values only, not objects" so values().delete() is doing "do the deletes by fetching raw values only, not objects".

Currently .values().delete() doesn't work at all (except accidentally they do for fast-path deletes...).

The disabling signals idea is dangerous if one can do global "signals disabled" call. For example "with disable_signals(): do_something()" will break model.__init__ in do_something() for models having ImageFields or GenericForeignKeys. To be safe the only way to disable the signals should be by listing the signal target functions explicitly. In addition, the disabling should be thread-safe and really fast (at least for the no-signals-disabled case). Model.__init__ is limited somewhat by signal.send speed already. So, my initial feeling is to not do this.

I would love it if there was some way to tell the DB to set foreign key constraints to ON DELETE CASCADE temporarily. But, there doesn't seem to be any way to do this in SQL.

comment:20 Changed 4 years ago by Preston Holmes

Yes - the global signal killer context manager is a bad idea - I was thinking of the narrow context of limiting test interactions - where it could be useful - but would be a bad choice for public API.

My only hangup on .values().delete() is that I think of values returning data - not objects, so should be decoupled (conceptually) from the DB - and so delete seems to leak back to the DB. That is a statement of my conceptual consistency issue - not that it wouldn't be a clean way to implement it on the internals side.

Basically trying to muck with signals enable/disable in any public API is sticky business. I still think a bulk_delete with lots of warning in the docs is the simplest way forward - if there is a good way forward at all.

comment:21 Changed 4 years ago by Anssi Kääriäinen

API proposal for bulk_delete():

  1. Does not send signals.
  2. Does "fast-path" deletion like in normal deletion where possible.
  3. [maybe] If cascades are needed, then cascading is implemented by fetching just the needed values from the DB, not as objects but as raw values.

The problem is that no. 3 above seems to be somewhat messy to implement. So, maybe just error out if cascades are needed instead?

comment:22 Changed 4 years ago by Carl Meyer

If I'm not mistaken, the need for cascade is entirely determined by the model schema; for a given call to bulk_delete, the need for cascade won't change at runtime. In other words, you're not going to have code calling bulk_delete that usually works but occasionally fails. So it seems fine to me to omit (3) and just raise an error if bulk_delete is used on a model requiring cascade. Implementing (3) sounds to me like a lot of additional deletion code and bug surface area, for little benefit.

Last edited 4 years ago by Carl Meyer (previous) (diff)

comment:23 Changed 4 years ago by Anssi Kääriäinen

We can always add cascades to bulk_delete if needed, so the suggested API is:

  • no signals
  • error on cascade
  • due to the above items we can do fast-path deletion always

Sadly this means no bulk_delete for inheritance cases.

comment:24 Changed 4 years ago by Simon Charette

This is sad indeed, was hoping to use this to delete inherited instance while avoiding base instance deletion.

comment:25 in reply to:  24 Changed 4 years ago by Carl Meyer

I consider inheritance to be a different case from cascade (both in concept, because you have a conceptually single object that spans rows in multiple tables, and in that it's implemented as a special case, the only case where you might 'cascade' in the reverse direction across an FK/OneToOne).

I could see perhaps implementing something like (3) for inheritance only, not for other cascades. Though I'm just as happy not allowing bulk_delete in inheritance cases.

In any case, I don't think the behavior charettes was hoping for is preferable. bulk_delete should either delete both parent and child rows in inheritance cases, or it should refuse to do anything at all. Deleting child rows only violates the inheritance abstraction. It should either just be done in raw SQL, or if it's really got convincing use-cases it should get a more explicit API, not occur as a surprising side-effect of bulk_delete.

comment:26 Changed 4 years ago by Anssi Kääriäinen

Do we still want this? The deletion code is now intelligent enough to do fast-path deletion where possible. We could improve it by having "no_signals" arg, and making it only fetch values() for cascade columns if there is no need for sending the signals (latter is likely tricky to do).

We should not have a delete which does not respect cascades along foreign keys. Some DBs do not enforce the foreign keys so this would be a real gun-pointed-at-foot.

I'm inclined to close this as wontfix...

comment:27 Changed 3 years ago by Tobias McNulty

I like the new logic in 1.5, but I still think you should be able to apply the DO_NOTHING logic on a per-query basis, rather than having to use whatever policy you set in the ForeignKey declaration across the project as a whole. For example, if I know I'm going to be deleting millions of rows, I'll want to enable this, but for normal use at just about every other point in my project, the default behavior is preferable. Could we just add something like a force_fast option to the delete() method that basically did the same thing as having DO_NOTHING on every foreign key in the model being deleted?

For others who stumble upon this ticket and wonder how the new code in 1.5 works, here's a link to the docs:

comment:28 Changed 2 years ago by Denilson Figueiredo de Sá

Cc: denilsonsa@… added

comment:29 Changed 22 months ago by Zach Borboa

Cc: Zach Borboa added

comment:30 Changed 11 months ago by Craig de Stigter

For those googling, kmike's workaround above is quite out of date. Since django 1.5+ you can use:


Private API, be careful, yadda yadda

Last edited 11 months ago by Craig de Stigter (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top