Opened 4 years ago

Last modified 4 months ago

#16614 assigned New feature

Support server-side cursors for queryset iteration in database backends

Reported by: toofishes Owned by: auvipy
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: memory cursors database
Cc: dpmcgee@…, nikolai@…, trbs@…, benth, charettes, macek@…, riccardo@… Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is related to concerns raised in #5423 as well as documentation issues noted in #13869.

Attached is a very rough first cut of a possible patch that adds server-side iteration for all backends that need it, as well as turning it on by default in the results_iter() codepaths, which is only used in the iterator() methods of the various QuerySet classes defined in django.db.models.query.

Observations:

Thoughts and feedback are welcome- I can imagine only enabling this by default for PostgreSQL only, as MySQL's implementation leaves something to be desired. I could also see never doing this by default and allowing it to be configured in DATABASE settings.

This was mildy tested with a random dumpdata operation on a random project using PostgreSQL. The max memory used by the dumpdata after applying this patch and the one from FS#5423, piping to /dev/null, went from 50MB to 26MB.

Attachments (1)

support-server-side-cursors.patch (6.7 KB) - added by toofishes 4 years ago.

Download all attachments as: .zip

Change History (15)

Changed 4 years ago by toofishes

comment:1 Changed 4 years ago by toofishes

  • Cc dpmcgee@… added
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 4 years ago by nikolai@…

  • Cc nikolai@… added

comment:3 Changed 4 years ago by aaugustin

  • Needs documentation set
  • Needs tests set
  • Triage Stage changed from Unreviewed to Accepted

Based on slides 55-63 of http://thebuild.com/presentations/unbreaking-django.pdf I think it's a good idea.

comment:4 Changed 4 years ago by akaariai

I did a very similar implementation based on discussions in https://groups.google.com/group/django-developers/browse_thread/thread/f19040e2e3229d7a

The main difference is that my version adds a queryset method .chunked(), which will use named cursor for PostgreSQL and will not fetch all the results in one go for SQLite3. SQLite3 has a problem if not fetching all the results in one go as updates done while iterating the results will be seen in the results.

The idea for the .chunked() method is that it will be documented as having backend-specific limitations which the .iterator() approach does not have. The abovementioned SQLite3 limitation is one, PostgreSQL has at least two:

  • As long as the cursor is open (you have a reference to the iterator in the code), server side resources will be tied. This might be important for cases where you open a lot of cursors and then iterate them in a template.
  • The named cursor is not iterable at all outside of a transaction (or you need to use WITH HOLD cursors, which will tie the resources even longer). This means named cursor will not be usable in autocommit mode.

I did not yet include anything for MySQL. There was some discussions about MySQL and it seems it could have some deadlocking problems.

The above limitations are not backwards compatible with current behavior of .iterator(). So, there might be some reason not to expose this as a default / settings behavior, but as a different queryset method you can use when you really need it. The conditions when you need this are exceptional, in normal HTML generation you will almost never want to use named cursors.

The patch is at: http://www.google.com/url?sa=D&q=https://github.com/akaariai/django/commit/8990e20df50ce110fe6ddbbdfed7a98987bb5835&usg=AFQjCNGtHz9sSHT0tOIWKXAZAypfvPIHyw

comment:5 Changed 3 years ago by anonymous

Has there been any movement on this issue? Do the core developers have any plans to merge this patch in the near future?

comment:6 Changed 3 years ago by lukeplant

akaariai is now a core developer, and could do this if he is still interested. The idea as described in comment 4 sounds solid to me. It would also be fine to have this implemented for some backends and not others IMO.

The link for the patch no longer works. Hopefully Anssi has a record of it somewhere.

comment:7 Changed 3 years ago by akaariai

I renamed the repo to django-old when Django was moved to github. Here is a working link: https://github.com/akaariai/django-old/commit/8990e20df50ce110fe6ddbbdfed7a98987bb5835

I can take care of final polish & commit, but I am not too interested in doing a lot of work on this right now. To get this committed making sure the patch works on current git HEAD, and writing some docs & tests is the way to get this into core.

comment:8 Changed 2 years ago by trbs

  • Cc trbs@… added

comment:9 Changed 2 years ago by benth

  • Cc benth added

comment:11 Changed 2 years ago by charettes

  • Cc charettes added

comment:12 Changed 19 months ago by Tuttle

  • Cc macek@… added

Is it really in stage Accepted? :)

I think the missing server-side cursor support keeps the Django ORM down...

comment:13 Changed 19 months ago by riquito

  • Cc riccardo@… added

comment:14 Changed 4 months ago by auvipy

  • Owner changed from nobody to auvipy
  • Status changed from new to assigned
  • Version changed from 1.3 to master
Note: See TracTickets for help on using tickets.
Back to Top