Opened 5 years ago

Closed 4 years ago

Last modified 3 years ago

#26400 closed Cleanup/optimization (wontfix)

QuerySet bulk_create method to handle generators to prevent loading all objects in memory at once

Reported by: Alexander Sterchov Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: bulk_create
Cc: Triage Stage: Someday/Maybe
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

In my case I need to create huge amount of objects using bulk create method with batch_size parameter.

The problem is I don't have enough memory to store all objects in a list. Even if I transmit generator it would be converted to list anyway (https://github.com/django/django/blob/1.9.4/django/db/models/query.py#L438).

I want to implement a feature to handle generators properly without loading all objects in memory, but bulk_create method returns list of objects as a result. That is unacceptable on large amounts of data.

How can I properly implement the method: create a new one or add a parameter to actual method?

Change History (7)

comment:1 Changed 5 years ago by Simon Charette

I'm not sure this is worth including into Django, is there a reason you can't split your bulk_create calls into batches that fit into memory?

from itertools import islice, chain

# Taken from https://code.activestate.com/recipes/303279-getting-items-in-batches/
def split(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([batchiter.next()], batchiter)

for batch in split(large_generator, 10000):  # Adjust size to fit in memory
    MyModel.objects.bulk_create(batch)

You could even make this a manager method if required.

comment:2 Changed 5 years ago by Alexander Sterchov

It's not the first time I need that opportunity. I mean sure I can write that split thing one more time as I did in other projects, but I don't see any reason why the feature couldn't be a part of Django.

Last edited 5 years ago by Alexander Sterchov (previous) (diff)

comment:3 Changed 5 years ago by Tim Graham

My first inclination was the same as Simon's but if you want to show what the changes to bulk_create() (or a new method) would look like, we can run it by the DevelopersMailingList to get some other opinions.

comment:4 in reply to:  2 Changed 5 years ago by Simon Charette

Replying to likeon:

It's not the first time I need that opportunity. I mean sure I can write that split thing one more time as I did in other projects, but I don't see any reason why the feature couldn't be a part of Django.

We'd have to alter both the signature and the return type of bulk_create in order to pass a flag enabling this feature and make sure not to return a list of the created objects. At this point I think this should be handled by another method/function.

I personally don't believe this use case is common enough to warrant an inclusion in Django but as Tim pointed out you could try leveraging support from the community on the developer mailing list.

comment:5 Changed 5 years ago by Tim Graham

Triage Stage: UnreviewedSomeday/Maybe

comment:6 Changed 4 years ago by Tim Graham

Resolution: wontfix
Status: newclosed

Closing in absence of follow up or discussion.

comment:7 Changed 3 years ago by Tim Graham

#28231 is a follow up ticket requesting similar behavior. The current consensus seems to be to document the behavior rather than to change it.

Note: See TracTickets for help on using tickets.
Back to Top