Opened 8 years ago
Closed 7 years ago
#27639 closed New feature (fixed)
Add a chunk size argument to QuerySet.iterator()
Reported by: | François Freitag | Owned by: | François Freitag |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | cursors database |
Cc: | florian@…, josh.smeaton@…, charettes@…, me@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Django currently fetches results from the database in batches of GET_ITERATOR_CHUNK_SIZE
(currently 100). When .iterator()
is used, usually for a large query, specifying the batch size would allow more control over the number of back-and-forth communications between Django and the database.
PEP249 define the size
argument for the .fetchmany()
method.
- MySQL-python provides the size argument for
.fetchmany()
. - SQLite provides the size argument for
.fetchmany()
. - cx_Oracle provides arraysize attribute on the cursor and the numRows argument for
.fetchmany()
. - Psycopg2 (used for PostgreSQL) provides the itersize attribute (used by server-side cursors, see discussion on This PR -- Added server side cursors support for QuerySet iteration in PostgreSQL.
Note:
In #26530, Anssi Kääriäinen proposed the name cursor_size
for this argument.
Change History (19)
comment:1 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 8 years ago
Cc: | added |
---|
comment:3 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 8 years ago
Has patch: | set |
---|
comment:5 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:6 by , 8 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Some test failures on Oracle must be fixed.
comment:7 by , 8 years ago
Patch needs improvement: | unset |
---|
comment:8 by , 8 years ago
Patch needs improvement: | set |
---|
comment:9 by , 8 years ago
Patch needs improvement: | unset |
---|
comment:10 by , 8 years ago
Patch needs improvement: | set |
---|
comment:11 by , 8 years ago
Patch needs improvement: | unset |
---|
comment:12 by , 8 years ago
Needs tests: | set |
---|
comment:13 by , 8 years ago
Needs tests: | unset |
---|
comment:14 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
comment:15 by , 8 years ago
Patch needs improvement: | set |
---|---|
Triage Stage: | Ready for checkin → Accepted |
Since it was marked as RFC, the PR received some updates to allow chunk_size=None
to disable server-side cursors (use case in #28062). A few review comments remain.
comment:16 by , 8 years ago
After reworking this PR to allow chunk_size=None
, I'm not convinced with chunk_size=None
:
- It is introduced to workaround an issue with PostgreSQL, I do not believe iterator should change for that reason.
- It's a PostgreSQL-only parameter, because server-side cursors cannot be disabled on Oracle, and it'll be ignored on databases that don't support server-side cursors.
- Its meaning is ambiguous. I would expect chunk_size=None to signify "do not use chunked fetch and fetch all the results at once", i.e. use
fetchall
. Django usesfetchmany
. - In most use cases I can think of, server-side cursors need to be either globally enabled/disabled, not on a per-query basis. If a end-user really want to do so, it's possible to setup an other connection and use
using()
to disable server-side cursors for that query. I have a hard time coming up with a use-case where third party libraries want to use iterator, but not with a server-side cursor.
comment:17 by , 8 years ago
Until a use case arises, omitting support for chunk_size=None
is fine with me.
comment:18 by , 8 years ago
Patch needs improvement: | unset |
---|
PR #7836