Support server-side cursors for queryset iteration in database backends
|Reported by:||toofishes||Owned by:||auvipy|
|Component:||Database layer (models, ORM)||Version:||master|
|Severity:||Normal||Keywords:||memory cursors database|
|Cc:||dpmcgee@…, nikolai@…, trbs@…, benth, charettes, macek@…, riccardo@…||Triage Stage:||Accepted|
|Has patch:||yes||Needs documentation:||yes|
|Needs tests:||yes||Patch needs improvement:||no|
Attached is a very rough first cut of a possible patch that adds server-side iteration for all backends that need it, as well as turning it on by default in the results_iter() codepaths, which is only used in the iterator() methods of the various QuerySet classes defined in django.db.models.query.
- SQLite appears to do iteration right out of the box when using fetchmany(), so no changes are needed.
- Oracle (via cx_Oracle) also does it right, fetching by default in batches of 50 when fetchmany() is used- see http://cx-oracle.sourceforge.net/html/cursor.html#Cursor.arraysize
- PostgreSQL gets functionality for usage of named cursors (http://initd.org/psycopg/docs/connection.html#connection.cursor), which leave the result set on the server side. The default fetch size of sets fetched this way is 2000, but we always use GET_ITERATOR_CHUNK_SIZE, so adjusting this is not necessary. A named cursor can only be used once.
- MySQL gets functionality for usage of SSCursor objects (http://mysql-python.sourceforge.net/MySQLdb-1.2.2/public/MySQLdb.cursors.SSCursor-class.html). This uses mysql_fetch_row() under the covers (http://dev.mysql.com/doc/refman/5.5/en/mysql-fetch-row.html). There are two cons here- it appears the library does not support batch fetches because it uses fetch_(single)_row under the covers, so this could result in significantly more chatter. MySQL also has no real concept of a cursor, so while you are using this server-side "cursor", you are not allowed to perform any other simultaneous queries.
Thoughts and feedback are welcome- I can imagine only enabling this by default for PostgreSQL only, as MySQL's implementation leaves something to be desired. I could also see never doing this by default and allowing it to be configured in DATABASE settings.
This was mildy tested with a random dumpdata operation on a random project using PostgreSQL. The max memory used by the dumpdata after applying this patch and the one from FS#5423, piping to /dev/null, went from 50MB to 26MB.
Change History (15)
Changed 4 years ago by toofishes
comment:1 Changed 4 years ago by toofishes
- Cc dpmcgee@… added
- Needs documentation unset
- Needs tests unset
- Patch needs improvement unset
comment:3 Changed 4 years ago by aaugustin
- Needs documentation set
- Needs tests set
- Triage Stage changed from Unreviewed to Accepted