Opened 19 months ago

Closed 11 months ago

Last modified 11 months ago

#28231 closed Cleanup/optimization (fixed)

Document that QuerySet.bulk_create() casts objs to a list

Reported by: Nir Izraeli Owned by: Botond Béres
Component: Documentation Version: 1.11
Severity: Normal Keywords:
Cc: nirizr@…, Botond Béres, Tim Martin Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Nir Izraeli)

When batch_size provided in bulk_create, a user might assume (As I myself have) that objs iterable will not be iterated more than batch_size times at once, and that no more than roughly batch_size Model objects reside in memory at any given time.

When using bulk_create for relatively big sets of objects provided by a generator object, it would be prefered to avoid iterating over the entire generator object. Moreover, if not iterating over the generator object is deemed unnecessary or out-of-scope for django it would be prefered to make a comment on said behavior in documentation.

I suggest two possible solutions:

  1. Document this behavior (bulk_create converts passed objs iterator to a list), or
  2. Avoid doing so when batch_size is given (or default is other than None. i.e. sqlite).

I did not research the possibility of avoiding the list conversion, but if that solution is accepted by the community I volunteer to investigate further and claim this ticket.

Change History (15)

comment:1 Changed 19 months ago by Nir Izraeli

Cc: nirizr@… added

CCing myself to receive updates.

comment:2 Changed 19 months ago by Nir Izraeli

Description: modified (diff)
Summary: bulk_create: avoid iterating `objs` more than necessary when `bulk_size` is providedbulk_create: avoid iterating `objs` more than necessary when `batch_size` is provided

comment:3 Changed 19 months ago by Tim Graham

Feel free to offer a patch. Looking at the code of QuerySet.bulk_create(), I don't understand how your proposal would work.

comment:4 Changed 19 months ago by Nir Izraeli

I'm thinking of creating a chunks generator method, similar to ones implemented in, for example, the following Stack Overflow Answers:

  1. https://stackoverflow.com/a/24527424/1146713
  2. https://stackoverflow.com/a/8290514/1146713

And then iterate over it in bulk_create and essentially run the existing bulk_create code for each batch_size-d chunks sequentially.

This means I'll be removing the need for _batched_insert and move the partition by the obj.pk is None check to after batches are made.

What do you think?

Also, if I am to create a patch should I post it here first or create a PR for it?

Last edited 14 months ago by Tim Graham (previous) (diff)

comment:5 Changed 19 months ago by Simon Charette

Also, if I am to create a patch should I post it here first or create a PR for it?

Create a PR for it and link it back to this ticket, it will allow you to run CI against and gather feedback more easily.

comment:6 Changed 19 months ago by Nir Izraeli

PR created and passing tests: https://github.com/django/django/pull/8540

comment:7 Changed 19 months ago by Nir Izraeli

Has patch: set
Owner: changed from nobody to Nir Izraeli
Status: newassigned

Claimed this issue and marked it as having a patch (assume a PR is considered a "patch" for tracking/reviewing purposes)

comment:8 Changed 19 months ago by Tim Graham

Summary: bulk_create: avoid iterating `objs` more than necessary when `batch_size` is providedAvoid iterating `objs` more than necessary in QuerySet.bulk_create() when `batch_size` is provided
Triage Stage: UnreviewedAccepted

comment:9 Changed 19 months ago by Tim Graham

Component: Database layer (models, ORM)Documentation
Has patch: unset
Summary: Avoid iterating `objs` more than necessary in QuerySet.bulk_create() when `batch_size` is providedDocument that QuerySet.bulk_create() casts objs to a list

I guess this is a duplicate of #26400 (closed as wontfix). François Freitag's comment on the PR: "I would rather document the current behavior and the reason why the iterator has to be consumed. I suggest going to the django-developers mailing list to see if others are interested."

Tentatively reclassifying as a documentation in light of this.

comment:10 Changed 14 months ago by Botond Béres

Owner: changed from Nir Izraeli to Botond Béres

comment:11 Changed 14 months ago by Botond Béres

Opened PR 9286 to document this behaviour, as explained in the discussion from PR 8540

comment:12 Changed 14 months ago by Botond Béres

Cc: Botond Béres added
Has patch: set

comment:13 Changed 14 months ago by Tim Martin

Cc: Tim Martin added
Triage Stage: AcceptedReady for checkin

Looks good to me.

comment:14 Changed 11 months ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 52aa26e:

Fixed #28231 -- Doc'd that QuerySet.bulk_create() casts objs to a list.

comment:15 Changed 11 months ago by Tim Graham <timograham@…>

In 881f66b:

[2.0.x] Fixed #28231 -- Doc'd that QuerySet.bulk_create() casts objs to a list.

Backport of 52aa26e6979ba81b00f1593d5ee8c5c73aaa6391 from master

Note: See TracTickets for help on using tickets.
Back to Top