Opened 6 years ago

Closed 5 years ago

#27639 closed New feature (fixed)

Add a chunk size argument to QuerySet.iterator()

Reported by: François Freitag Owned by: François Freitag
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: cursors database
Cc: florian@…, josh.smeaton@…, charettes@…, me@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django currently fetches results from the database in batches of GET_ITERATOR_CHUNK_SIZE (currently 100). When .iterator() is used, usually for a large query, specifying the batch size would allow more control over the number of back-and-forth communications between Django and the database.

PEP249 define the size argument for the .fetchmany() method.

Note:
In #26530, Anssi Kääriäinen proposed the name cursor_size for this argument.

Change History (19)

comment:1 Changed 6 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

comment:2 Changed 6 years ago by Adam Johnson

Cc: me@… added

comment:3 Changed 6 years ago by François Freitag

Owner: changed from nobody to François Freitag
Status: newassigned

comment:4 Changed 6 years ago by François Freitag

Has patch: set

comment:5 Changed 6 years ago by Tim Graham

Triage Stage: AcceptedReady for checkin

comment:6 Changed 6 years ago by Tim Graham

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Some test failures on Oracle must be fixed.

comment:7 Changed 6 years ago by François Freitag

Patch needs improvement: unset

comment:8 Changed 5 years ago by Tim Graham

Patch needs improvement: set

comment:9 Changed 5 years ago by François Freitag

Patch needs improvement: unset

comment:10 Changed 5 years ago by François Freitag

Patch needs improvement: set

comment:11 Changed 5 years ago by François Freitag

Patch needs improvement: unset

comment:12 Changed 5 years ago by Mariusz Felisiak

Needs tests: set

comment:13 Changed 5 years ago by François Freitag

Needs tests: unset

comment:14 Changed 5 years ago by Mariusz Felisiak

Triage Stage: AcceptedReady for checkin

comment:15 Changed 5 years ago by Tim Graham

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Since it was marked as RFC, the PR received some updates to allow chunk_size=None to disable server-side cursors (use case in #28062). A few review comments remain.

comment:16 Changed 5 years ago by François Freitag

After reworking this PR to allow chunk_size=None, I'm not convinced with chunk_size=None:

  • It is introduced to workaround an issue with PostgreSQL, I do not believe iterator should change for that reason.
  • It's a PostgreSQL-only parameter, because server-side cursors cannot be disabled on Oracle, and it'll be ignored on databases that don't support server-side cursors.
  • Its meaning is ambiguous. I would expect chunk_size=None to signify "do not use chunked fetch and fetch all the results at once", i.e. use fetchall. Django uses fetchmany.
  • In most use cases I can think of, server-side cursors need to be either globally enabled/disabled, not on a per-query basis. If a end-user really want to do so, it's possible to setup an other connection and use using() to disable server-side cursors for that query. I have a hard time coming up with a use-case where third party libraries want to use iterator, but not with a server-side cursor.

comment:17 Changed 5 years ago by Tim Graham

Until a use case arises, omitting support for chunk_size=None is fine with me.

comment:18 Changed 5 years ago by François Freitag

Patch needs improvement: unset

comment:19 Changed 5 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In edee5a8d:

Fixed #27639 -- Added chunk_size parameter to QuerySet.iterator().

Note: See TracTickets for help on using tickets.
Back to Top