Opened 6 years ago
Closed 5 years ago
#27639 closed New feature (fixed)
Add a chunk size argument to QuerySet.iterator()
Reported by: | François Freitag | Owned by: | François Freitag |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | cursors database |
Cc: | florian@…, josh.smeaton@…, charettes@…, me@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Django currently fetches results from the database in batches of GET_ITERATOR_CHUNK_SIZE
(currently 100). When .iterator()
is used, usually for a large query, specifying the batch size would allow more control over the number of back-and-forth communications between Django and the database.
PEP249 define the size
argument for the .fetchmany()
method.
- MySQL-python provides the size argument for
.fetchmany()
. - SQLite provides the size argument for
.fetchmany()
. - cx_Oracle provides arraysize attribute on the cursor and the numRows argument for
.fetchmany()
. - Psycopg2 (used for PostgreSQL) provides the itersize attribute (used by server-side cursors, see discussion on This PR -- Added server side cursors support for QuerySet iteration in PostgreSQL.
Note:
In #26530, Anssi Kääriäinen proposed the name cursor_size
for this argument.
Change History (19)
comment:1 Changed 6 years ago by
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 Changed 6 years ago by
Cc: | me@… added |
---|
comment:3 Changed 6 years ago by
Owner: | changed from nobody to François Freitag |
---|---|
Status: | new → assigned |
comment:4 Changed 6 years ago by
Has patch: | set |
---|
comment:5 Changed 6 years ago by
Triage Stage: | Accepted → Ready for checkin |
---|
comment:6 Changed 6 years ago by
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Some test failures on Oracle must be fixed.
comment:7 Changed 6 years ago by
Patch needs improvement: | unset |
---|
comment:8 Changed 5 years ago by
Patch needs improvement: | set |
---|
comment:9 Changed 5 years ago by
Patch needs improvement: | unset |
---|
comment:10 Changed 5 years ago by
Patch needs improvement: | set |
---|
comment:11 Changed 5 years ago by
Patch needs improvement: | unset |
---|
comment:12 Changed 5 years ago by
Needs tests: | set |
---|
comment:13 Changed 5 years ago by
Needs tests: | unset |
---|
comment:14 Changed 5 years ago by
Triage Stage: | Accepted → Ready for checkin |
---|
comment:15 Changed 5 years ago by
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Since it was marked as RFC, the PR received some updates to allow chunk_size=None
to disable server-side cursors (use case in #28062). A few review comments remain.
comment:16 Changed 5 years ago by
After reworking this PR to allow chunk_size=None
, I'm not convinced with chunk_size=None
:
- It is introduced to workaround an issue with PostgreSQL, I do not believe iterator should change for that reason.
- It's a PostgreSQL-only parameter, because server-side cursors cannot be disabled on Oracle, and it'll be ignored on databases that don't support server-side cursors.
- Its meaning is ambiguous. I would expect chunk_size=None to signify "do not use chunked fetch and fetch all the results at once", i.e. use
fetchall
. Django usesfetchmany
. - In most use cases I can think of, server-side cursors need to be either globally enabled/disabled, not on a per-query basis. If a end-user really want to do so, it's possible to setup an other connection and use
using()
to disable server-side cursors for that query. I have a hard time coming up with a use-case where third party libraries want to use iterator, but not with a server-side cursor.
comment:17 Changed 5 years ago by
Until a use case arises, omitting support for chunk_size=None
is fine with me.
comment:18 Changed 5 years ago by
Patch needs improvement: | unset |
---|
PR #7836