Remove chunked reads from iter(qs)
|Reported by:||Anssi Kääriäinen||Owned by:||nobody|
|Component:||Database layer (models, ORM)||Version:||1.4|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
The queryset iterator protocol does convert rows lazily to objects when iterated. This has two advantages:
- If one iterates just part of the queryset, there is no need to do model conversion for all objects.
- Again, if iterating just part of the qs, some backends allow you to fetch just part of the rows from the DB (oracle, for example).
However, there are some costs, too:
- Complexity in the
__iter__-> _result_iter -> (_results_cache, _iter) -> _iterator implementation.
- The lazy fetching costs around 5-10% performance in the case of "for obj in qs.all()" (1000 objs, 2 fields). For values_list('id') the cost is around 30%.
- The current implementation silently discards some exceptions when doing list(qs). This can be annoying especially when debugging django-core code.
My take is we are optimizing the wrong case currently. That is, the case where one wants to consume a queryset only partially, but can't use the .iterator() method. The case would be something like:
for obj in qs: if somecond: break # Now, another loop for the same queryset! for obj in qs: if someothercond: break
If there is no another loop, it is possible to use .iterator(). If one of the above loops consumes major portion of the qs, then there is no benefit in doing partial object conversion.
The question is if there are common patterns where the current implementation is worth the code complexity & performance loss for the common cases.
I will leave this as DDN, as this change is obviously something that needs to be considered carefully.
There is a branch implementing the removal of chunked reads at: https://github.com/akaariai/django/compare/non_chunked_reads