Opened 6 weeks ago
Last modified 8 days ago
#36526 assigned Cleanup/optimization
bulk_update uses more memory than expected — at Version 1
Reported by: | Anže Pečar | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.2 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
I recently tried to update a large number of objects with:
things = list(Thing.objects.all()) # A large number of objects e.g. > 1_000_000 Thing.objects.bulk_update(things, ["description"], batch_size=300)
The first line above fits into the available memory (~2GB in my case), but the second line caused a SIGTERM, even though I had an additional 2GB of available memory. This was a bit surprising as I wasn't expecting bulk_update to use this much memory since all the objects to update were already loaded.
My solution was:
for batch in batched(things, 300): Thing.objects.bulk_update(batch, ["description"], batch_size=300)
The first example bulk_update
used 2.8GB of memory, but in the second example, it only used 62MB.
A GitHub repository that reproduces the problem with memray results.
This might be related to https://code.djangoproject.com/ticket/31202, but I decided to open a new issue because I wouldn't mind waiting longer for bulk_update to complete, but the SIGTERM surprised me.