Opened 7 years ago

Closed 7 years ago

#27639 closed New feature (fixed)

Add a chunk size argument to QuerySet.iterator()

Reported by: François Freitag Owned by: François Freitag
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: cursors database
Cc: florian@…, josh.smeaton@…, charettes@…, me@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django currently fetches results from the database in batches of GET_ITERATOR_CHUNK_SIZE (currently 100). When .iterator() is used, usually for a large query, specifying the batch size would allow more control over the number of back-and-forth communications between Django and the database.

PEP249 define the size argument for the .fetchmany() method.

Note:
In #26530, Anssi Kääriäinen proposed the name cursor_size for this argument.

Change History (19)

comment:1 by Tim Graham, 7 years ago

Triage Stage: UnreviewedAccepted

comment:2 by Adam Johnson, 7 years ago

Cc: me@… added

comment:3 by François Freitag, 7 years ago

Owner: changed from nobody to François Freitag
Status: newassigned

comment:4 by François Freitag, 7 years ago

Has patch: set

comment:5 by Tim Graham, 7 years ago

Triage Stage: AcceptedReady for checkin

comment:6 by Tim Graham, 7 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Some test failures on Oracle must be fixed.

comment:7 by François Freitag, 7 years ago

Patch needs improvement: unset

comment:8 by Tim Graham, 7 years ago

Patch needs improvement: set

comment:9 by François Freitag, 7 years ago

Patch needs improvement: unset

comment:10 by François Freitag, 7 years ago

Patch needs improvement: set

comment:11 by François Freitag, 7 years ago

Patch needs improvement: unset

comment:12 by Mariusz Felisiak, 7 years ago

Needs tests: set

comment:13 by François Freitag, 7 years ago

Needs tests: unset

comment:14 by Mariusz Felisiak, 7 years ago

Triage Stage: AcceptedReady for checkin

comment:15 by Tim Graham, 7 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

Since it was marked as RFC, the PR received some updates to allow chunk_size=None to disable server-side cursors (use case in #28062). A few review comments remain.

comment:16 by François Freitag, 7 years ago

After reworking this PR to allow chunk_size=None, I'm not convinced with chunk_size=None:

  • It is introduced to workaround an issue with PostgreSQL, I do not believe iterator should change for that reason.
  • It's a PostgreSQL-only parameter, because server-side cursors cannot be disabled on Oracle, and it'll be ignored on databases that don't support server-side cursors.
  • Its meaning is ambiguous. I would expect chunk_size=None to signify "do not use chunked fetch and fetch all the results at once", i.e. use fetchall. Django uses fetchmany.
  • In most use cases I can think of, server-side cursors need to be either globally enabled/disabled, not on a per-query basis. If a end-user really want to do so, it's possible to setup an other connection and use using() to disable server-side cursors for that query. I have a hard time coming up with a use-case where third party libraries want to use iterator, but not with a server-side cursor.

comment:17 by Tim Graham, 7 years ago

Until a use case arises, omitting support for chunk_size=None is fine with me.

comment:18 by François Freitag, 7 years ago

Patch needs improvement: unset

comment:19 by Tim Graham <timograham@…>, 7 years ago

Resolution: fixed
Status: assignedclosed

In edee5a8d:

Fixed #27639 -- Added chunk_size parameter to QuerySet.iterator().

Note: See TracTickets for help on using tickets.
Back to Top