Opened 7 weeks ago

Closed 7 weeks ago

Last modified 7 weeks ago

#30533 closed Bug (wontfix)

Delete cascade on large tables can cause process termination on PostgreSQL.

Reported by: Thomas Riccardi Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: queryset delete cascade batch bulk_batch_size postgresql crash
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Scenario

  • Model A has a foreign key to model B with on_delete=CASCADE
  • many B objects
  • execute B.objects.delete()

Expected result:

  • deletion works

Actual result:

  • when there are many B objects to delete (in our case it was 14M rows), the database process gets killed before completion:
    2019-05-28 19:17:42.237 CEST
    [1]: [2-1] db=,user= LOG: server process (PID 12911) was terminated by signal 9: Killed
    2019-05-28 19:17:42.238 CEST
    [1]: [3-1] db=,user= DETAIL: Failed process was running: DELETE FROM "a" WHERE "a"."b_id" IN (17271, 17272, 17273, 17274, 17275, <truncated enormous list>
    
  • with a smaller database it worked (2M rows)

Analysis

It seems the related objects A get deleted in one query with an unbound IN (...) list of B objects.
In fact this pattern already lead to an issue with the sqlite backend (#link0: sqlite supports only 999 parameters per query)
This was fixed in django 1.8 by adding batch query #link1, #link2 with a size specified per backend:

Workaround

As a temporary workaround we monkey patched the connection instance own bulk_batch_size and limit to 64k.

import types

def monkey_patch_connection_bulk_batch_size(connection):
    def limited_bulk_batch_size(self, fields, objs):
        """
        PostgreSQL can crash with too many parameters in a query
        e.g. 'DELETE FROM x WHERE x.y IN (...large list...)'
        => limit to 64k
        """
        return 2**16
    connection.ops.bulk_batch_size = types.MethodType(limited_bulk_batch_size, connection.ops)

It worked great in our case: we used it in a migration.
workaround limitations:

  • no idea where to monkey patch for global usage
  • no idea how to choose the bulk size value
  • didn't handle the more complex computation using fields and objs parameters

(Related remark: this is for deleting the related objects, then for the main objects deletion django already uses batch deletion, but with much smaller batch size: GET_ITERATOR_CHUNK_SIZE = 100; with some latency to the database it's a really small value (0.2ms per query))

Context

  • postgresql 9.6 (GCP Cloud SQL)
  • django 2.0.13 (but related code has not changed in current master it seems)

---
link 0 https://code.djangoproject.com/ticket/16426
link 1 https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/models/deletion.py#L224-L225
link 2 https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/models/deletion.py#L167
link 3 https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/backends/sqlite3/operations.py#L27-L40
link 4 https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/backends/oracle/operations.py#L592-L596
link 5 https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/backends/base/operations.py#L65-L71

Change History (4)

comment:1 Changed 7 weeks ago by felixxm

Version: 2.0master

Thanks for the report. Do you know any source that describes the max number of parameters for PostgreSQL? (e.g. 64k is specified in db docs on Oracle).

comment:2 in reply to:  1 Changed 7 weeks ago by Thomas Riccardi

Replying to felixxm:

Thanks for the report. Do you know any source that describes the max number of parameters for PostgreSQL? (e.g. 64k is specified in db docs on Oracle).

I think there is no hardcoded limits in postgresql; what I think probably happened is a high memory allocation that resulted in an OOMKill on the connection process on the db side: with more RAM the query could probably work.

The queries pattern in this scenario is:

  • get all B objects to delete: SELECT
  • delete all A related objects: the unbound DELETE IN that crashes
  • delete all B objects: DELETE IN with batch of 100

It would make sense to have the same size for the 2 DELETE IN, but 100 is really small for DELETEs (the result size is constant, not proportional to the number of deleted objects): maybe add another constant for DELETE: GET_ITERATOR_CHUNK_SIZE should be for get (where the result size is proportional to the number of objects), with a much higher value; this could be part of the fix for this ticket?

comment:3 Changed 7 weeks ago by felixxm

Resolution: wontfix
Status: newclosed
Summary: Delete cascade can break postgresql database: too large 'DELETE FROM a WHERE a.b_id IN (...)'Delete cascade can break postgresql database: too large 'DELETE FROM a WHERE a.b_id IN (...)'.

If we don't have any restrictions in PostgreSQL I wouldn't classify this as a bug in Django. In SQLite and Oracle we limit the number of queries parameters or even the number of parameters in IN (...) list, because we have to. These are databases limitations, which is not the case on PostgreSQL. You have at least two ways to handle this:

comment:4 Changed 7 weeks ago by felixxm

Summary: Delete cascade can break postgresql database: too large 'DELETE FROM a WHERE a.b_id IN (...)'.Delete cascade on large tables can cause process termination on PostgreSQL.
Note: See TracTickets for help on using tickets.
Back to Top