#28231 closed Cleanup/optimization (fixed)
Document that QuerySet.bulk_create() casts objs to a list
Reported by: | Nir Izraeli | Owned by: | Botond Béres |
---|---|---|---|
Component: | Documentation | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | nirizr@…, Botond Béres, Tim Martin | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
When batch_size
provided in bulk_create
, a user might assume (As I myself have) that objs
iterable will not be iterated more than batch_size
times at once, and that no more than roughly batch_size
Model objects reside in memory at any given time.
When using bulk_create
for relatively big sets of objects provided by a generator object, it would be prefered to avoid iterating over the entire generator object. Moreover, if not iterating over the generator object is deemed unnecessary or out-of-scope for django it would be prefered to make a comment on said behavior in documentation.
I suggest two possible solutions:
- Document this behavior (
bulk_create
converts passedobjs
iterator to a list), or - Avoid doing so when
batch_size
is given (or default is other thanNone
. i.e. sqlite).
I did not research the possibility of avoiding the list conversion, but if that solution is accepted by the community I volunteer to investigate further and claim this ticket.
Change History (15)
comment:1 by , 7 years ago
Cc: | added |
---|
comment:2 by , 7 years ago
Description: | modified (diff) |
---|---|
Summary: | bulk_create: avoid iterating `objs` more than necessary when `bulk_size` is provided → bulk_create: avoid iterating `objs` more than necessary when `batch_size` is provided |
comment:3 by , 7 years ago
Feel free to offer a patch. Looking at the code of QuerySet.bulk_create()
, I don't understand how your proposal would work.
comment:4 by , 7 years ago
I'm thinking of creating a chunks
generator method, similar to ones implemented in, for example, the following Stack Overflow Answers:
And then iterate over it in bulk_create
and essentially run the existing bulk_create
code for each batch_size
-d chunks sequentially.
This means I'll be removing the need for _batched_insert
and move the partition by the obj.pk is None
check to after batches are made.
What do you think?
Also, if I am to create a patch should I post it here first or create a PR for it?
comment:5 by , 7 years ago
Also, if I am to create a patch should I post it here first or create a PR for it?
Create a PR for it and link it back to this ticket, it will allow you to run CI against and gather feedback more easily.
comment:6 by , 7 years ago
PR created and passing tests: https://github.com/django/django/pull/8540
comment:7 by , 7 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Claimed this issue and marked it as having a patch (assume a PR is considered a "patch" for tracking/reviewing purposes)
comment:8 by , 7 years ago
Summary: | bulk_create: avoid iterating `objs` more than necessary when `batch_size` is provided → Avoid iterating `objs` more than necessary in QuerySet.bulk_create() when `batch_size` is provided |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:9 by , 7 years ago
Component: | Database layer (models, ORM) → Documentation |
---|---|
Has patch: | unset |
Summary: | Avoid iterating `objs` more than necessary in QuerySet.bulk_create() when `batch_size` is provided → Document that QuerySet.bulk_create() casts objs to a list |
I guess this is a duplicate of #26400 (closed as wontfix). François Freitag's comment on the PR: "I would rather document the current behavior and the reason why the iterator has to be consumed. I suggest going to the django-developers mailing list to see if others are interested."
Tentatively reclassifying as a documentation in light of this.
comment:10 by , 7 years ago
Owner: | changed from | to
---|
comment:11 by , 7 years ago
comment:12 by , 7 years ago
Cc: | added |
---|---|
Has patch: | set |
CCing myself to receive updates.