Opened 9 years ago
Closed 8 years ago
#27639 closed New feature (fixed)
Add a chunk size argument to QuerySet.iterator()
| Reported by: | François Freitag | Owned by: | François Freitag |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | cursors database |
| Cc: | florian@…, josh.smeaton@…, charettes@…, me@… | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Django currently fetches results from the database in batches of GET_ITERATOR_CHUNK_SIZE (currently 100). When .iterator() is used, usually for a large query, specifying the batch size would allow more control over the number of back-and-forth communications between Django and the database.
PEP249 define the size argument for the .fetchmany() method.
- MySQL-python provides the size argument for
.fetchmany(). - SQLite provides the size argument for
.fetchmany(). - cx_Oracle provides arraysize attribute on the cursor and the numRows argument for
.fetchmany(). - Psycopg2 (used for PostgreSQL) provides the itersize attribute (used by server-side cursors, see discussion on This PR -- Added server side cursors support for QuerySet iteration in PostgreSQL.
Note:
In #26530, Anssi Kääriäinen proposed the name cursor_size for this argument.
Change History (19)
comment:1 by , 9 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:2 by , 9 years ago
| Cc: | added |
|---|
comment:3 by , 9 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:4 by , 9 years ago
| Has patch: | set |
|---|
comment:5 by , 9 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
comment:6 by , 9 years ago
| Patch needs improvement: | set |
|---|---|
| Triage Stage: | Ready for checkin → Accepted |
Some test failures on Oracle must be fixed.
comment:7 by , 9 years ago
| Patch needs improvement: | unset |
|---|
comment:8 by , 9 years ago
| Patch needs improvement: | set |
|---|
comment:9 by , 9 years ago
| Patch needs improvement: | unset |
|---|
comment:10 by , 9 years ago
| Patch needs improvement: | set |
|---|
comment:11 by , 9 years ago
| Patch needs improvement: | unset |
|---|
comment:12 by , 9 years ago
| Needs tests: | set |
|---|
comment:13 by , 9 years ago
| Needs tests: | unset |
|---|
comment:14 by , 9 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
comment:15 by , 9 years ago
| Patch needs improvement: | set |
|---|---|
| Triage Stage: | Ready for checkin → Accepted |
Since it was marked as RFC, the PR received some updates to allow chunk_size=None to disable server-side cursors (use case in #28062). A few review comments remain.
comment:16 by , 9 years ago
After reworking this PR to allow chunk_size=None, I'm not convinced with chunk_size=None:
- It is introduced to workaround an issue with PostgreSQL, I do not believe iterator should change for that reason.
- It's a PostgreSQL-only parameter, because server-side cursors cannot be disabled on Oracle, and it'll be ignored on databases that don't support server-side cursors.
- Its meaning is ambiguous. I would expect chunk_size=None to signify "do not use chunked fetch and fetch all the results at once", i.e. use
fetchall. Django usesfetchmany. - In most use cases I can think of, server-side cursors need to be either globally enabled/disabled, not on a per-query basis. If a end-user really want to do so, it's possible to setup an other connection and use
using()to disable server-side cursors for that query. I have a hard time coming up with a use-case where third party libraries want to use iterator, but not with a server-side cursor.
comment:17 by , 9 years ago
Until a use case arises, omitting support for chunk_size=None is fine with me.
comment:18 by , 8 years ago
| Patch needs improvement: | unset |
|---|
PR #7836