Opened 6 weeks ago

Last modified 8 days ago

#36526 assigned Cleanup/optimization

bulk_update uses more memory than expected — at Initial Version

Reported by: Anže Pečar Owned by:
Component: Database layer (models, ORM) Version: 5.2
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I recently tried to update a large number of objects with:

things = list(Thing.objects.all()) # A large number of objects e.g. > 1_000_000
Thing.objects.bulk_update(things, ["description"], batch_size=300)

The first line above fits into the available memory (~2GB in my case), but the second line caused a SIGTERM, even though I had an additional 2GB of available memory. This was a bit surprising as I wasn't expecting bulk_update to use this much memory since all the objects to update were already loaded.

My solution was:

for batch in batched(things, 300):
     Thing.objects.bulk_update(batch, ["description"], batch_size=300)

The first example bulk_update used 2.8GB of memory, but in the second example, it only used 62MB.

A GitHub repository that reproduces the problem with memray results.

Looking at the source code of bulk_update, the issue seems to be that Django builds the updates list before starting to execute the queries. I'd be happy to contribute a patch that makes the updates list lazy unless there are concerns about adding more computation between each update call and thus making the transaction longer?

This might be related to https://code.djangoproject.com/ticket/31202, but I decided to open a new issue because I wouldn't mind waiting longer for bulk_update to complete, but the SIGTERM surprised me.

Change History (0)

Note: See TracTickets for help on using tickets.
Back to Top