Remove special handling of deferred models in queryset iterators
|Reported by:||akaariai||Owned by:||nobody|
|Component:||Database layer (models, ORM)||Version:||1.3|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Suor gave me a great idea in ticket #17025: use attname cache for __init__. This turned out to work really well for deferred models: we don't need to special case them any more in queryset iterator loops (we do of course need to get the right deferred model class still).
Because the model __init__ did need to get the data in the same order as the model._meta.fields for *args based initialization, we couldn't use the fast init for deferred models. But if the model's options know the selected columns in the queryset, we suddenly can use the *args based __init__ which is a lot faster. A new model._meta.get_init_attnames() method was added for this.
One might wonder if this is backwards compatible. Well, the __init__ behaviour surely changed for deferred models, but I don't think this matters: the deferred classes are only for internal use of querysets, and thus the user should never call __init__ of deferred models directly.
The speedup from djangobench for raw_deferred is 2.5x. The speedup for query_all and query_all_multifield is about 1.05. There isn't a djangobench test for .defer, but according to my test .only('id') is about 2x faster, and .only('id)[0:100] is about 1.6x faster.
The attached patch passes all test. There is some cleanup needed in get_cached_row, but there is a separate patch for optimizing those parts of the queryset, so I didn't bother to clean it up. However there is something strange going on with select_related querysets and its handling of load_fields. It seems that sometimes the field is passed as field.attname and sometimes as field.name to deferred_model_factory, and I don't believe that to be correct. The patch just checks for both, but that is the wrong cure. I believe the load_fields should always be field.name, not field.attname. Another tickets problem IMHO.
Suor also suggested the idea of using __dict__.update in model.__init__. That would speed up model initialization somewhat, but I believe that is backwards incompatible as __setattr__ would not be called. Is this corret? We are talking about 10-15% speed increase in the case the model has a lot of columns.
Change History (5)
Changed 2 years ago by akaariai
comment:1 Changed 2 years ago by aaugustin
- Needs documentation unset
- Needs tests unset
- Patch needs improvement unset
- Triage Stage changed from Unreviewed to Design decision needed
comment:3 Changed 11 months ago by Alex
- Triage Stage changed from Design decision needed to Accepted