Opened 6 weeks ago

Last modified 8 days ago

#36526 assigned Cleanup/optimization

bulk_update uses more memory than expected — at Version 1

Reported by: Anže Pečar Owned by:
Component: Database layer (models, ORM) Version: 5.2
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Anže Pečar)

I recently tried to update a large number of objects with:

things = list(Thing.objects.all()) # A large number of objects e.g. > 1_000_000
Thing.objects.bulk_update(things, ["description"], batch_size=300)

The first line above fits into the available memory (~2GB in my case), but the second line caused a SIGTERM, even though I had an additional 2GB of available memory. This was a bit surprising as I wasn't expecting bulk_update to use this much memory since all the objects to update were already loaded.

My solution was:

for batch in batched(things, 300):
     Thing.objects.bulk_update(batch, ["description"], batch_size=300)

The first example bulk_update used 2.8GB of memory, but in the second example, it only used 62MB.

A GitHub repository that reproduces the problem with memray results.

This might be related to https://code.djangoproject.com/ticket/31202, but I decided to open a new issue because I wouldn't mind waiting longer for bulk_update to complete, but the SIGTERM surprised me.

Change History (1)

comment:1 by Anže Pečar, 6 weeks ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top