#35317 closed New feature (wontfix)
Add the possibility to do prefetches for only a subset of instances
Reported by: | Laurent Lyaudet | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | |
Cc: | Laurent Lyaudet | Triage Stage: | Unreviewed |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Hello,
I have use cases with serializers returning a lot of distinct informations.
For example, my serialiser can be called with many=True and a query of 2000 Order instances.
And for a small part of these 2000 instances, I need very expensive prefetches.
Currently, I deal with that by overloading the _fetch_all() method and make by hand the hyper-specifics and expensive prefetches,
after everything else.
What I propose is to add a filter_callback kwarg to Prefetch object.
filter_callback would be either None (default) or a function taking only one instance of the previous level of prefetch
and return a boolean to know if the instance needs that prefetch.
A simple example is when you will filter on some status field of Order,
but in that case, you can just modify the query somehow.
And unfortunately, the order id would still be injected in the prefetch query.
But for more realistic examples, you do not want to recode the business logic
in the SQL query to know what needs the prefetch or not.
Example :
orders = ( Order.objects.filter(...) .exclude(...) .annotate(...) .prefetch_related( "items", "address", Prefetch( "packing_task__zonings", queryset=Zonings.objects.filter(...), filter_callback=lambda packing_task: packing_task.order.needs_to_consider_packing_problematic_zonings(), to="problematic_zonings", ) ) )
Best regards,
Laurent Lyaudet
Change History (7)
comment:1 by , 9 months ago
Description: | modified (diff) |
---|
comment:2 by , 9 months ago
comment:4 by , 9 months ago
I'm not convinced we should add a feature for that given prefetch_related_objects
is already a documented public API that allows for this feature to be built on top of
orders = ( Order.objects.filter(...) .exclude(...) .annotate(...) .prefetch_related( "items", "address", ) ) predicate = lambda order: order.needs_to_consider_packing_problematic_zonings() problematic_zoning_orders = filter(predicate, orders) prefetch_related_objects( problematic_zoning_orders, Prefetch( "packing_task__zonings", queryset=Zonings.objects.filter(...), to="problematic_zonings" ), )
I'd a bit more trickier when dealing with predicates for nested prefetches but it's still doable.
Just like any other non-trivial feature addition like this one I suggest trying to gather consensus on a general need for this feature on the forum and / or developer mailing list. From providing support over this channels in the past years I don't remember anyone else running into a similar issue that couldn't be solved with prefetch_related_objects
directly.
follow-up: 6 comment:5 by , 9 months ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I agree with Simon. The ORM is already packaged with features, so we generally don't add new options for things that are already possible. Especially for niche use cases.
You can follow triaging guidelines with regards to wontfix tickets and take this to DevelopersMailingList, if you don't agree.
comment:6 by , 9 months ago
TLDR : Use https://pypi.org/project/django-monkey-patches/
Replying to Simon Charette:
I'm not convinced we should add a feature for that given
prefetch_related_objects
is already a documented public API that allows for this feature to be built on top of
I'd be a bit more trickier when dealing with predicates for nested prefetches but it's still doable.
What do you mean by a little bit trickier ?...
I'm using querysets with Django Rest Framework serializers.
I find it convenient to add a static method enhance_queryset()
in each model serializer that adds the needed prefetches.
Moreover, when I have nested model serializers,
I call recursively the methods enhance_queryset()
of the nested model serializers
in the method enhance_queryset()
of the main model serializer.
If you use prefetch_related_objects()
instead, either you do not inject querysets in your serializers,
or you have to deal with all of this in the init() method of your serializer.
Moreover, you have to gather the relevant objects by hand with nested loops to put them in prefetch_related_objects()
...
whilst it is already dealed with with nested prefetches using my patch...
I think my solution is superior, but I will not lose too much time arguing.
Just like any other non-trivial feature addition like this one I suggest trying to gather consensus on a general need for this feature on the forum and / or developer mailing list. From providing support over this channels in the past years I don't remember anyone else running into a similar issue that couldn't be solved with
prefetch_related_objects
directly.
Sorry, if I do that it will split the discussion in many places;
and frankly, I already know how are recieved most of the feature requests on forums/mailing lists/whatever.
Bureaucracy is not for me.
Replying to Mariusz Felisiak:
I agree with Simon. The ORM is already packaged with features, so we generally don't add new options for things that are already possible. Especially for niche use cases.
You can follow triaging guidelines with regards to wontfix tickets and take this to DevelopersMailingList, if you don't agree.
Idem.
I don't see any way to upvote a feature request here. It would be a nice way to know what dev using Django wants.
comment:7 by , 9 months ago
I did a second PR https://github.com/django/django/pull/18003 adding a post_prefetch_callback argument to Prefetch.
(Note that the PR has a failed test on Jenkins for TimeOutNotRaised
and I don't see any link with my code.
"Normal tests" are fine.)
Here is a redacted production code example that I could develop with the corresponding patch in my package django-monkey-patches.
I doubt you can find an efficient way and requiring around the same amount of code
to do the same thing without these features.
class OModelSerializer(...): @staticmethod def enhance_queryset( query_set, prefix: str = "", c_id: Optional[CId] = None, ): query_set = SomeMixin1.enhance_queryset(query_set, prefix) def ventilate_ps_by_c_id( lookup, done_queries, ): prefetch_to = lookup.prefetch_to prefetch_before = get_previous_prefetch_to(prefetch_to) for obj in prefetch_before: if hasattr(obj, "needed_ps"): if not hasattr(obj, "needed_p_by"): obj.needed_p_by = {} for obj2 in obj.needed_ps: obj.needed_p_by[obj2.c_id] = obj2 return query_set.prefetch_related( f"{prefix}p2", Prefetch( f"{prefix}s", post_prefetch_callback=create_post_prefetch_callback_add_backward_multiple( retrieve_forward_cache_callback=lambda o: [o.s] if o.s_id else [], backward_cache_name="current_o_ancestors", ), ), Prefetch( f"{prefix}c2", post_prefetch_callback=create_post_prefetch_callback_add_backward_multiple( retrieve_forward_cache_callback=lambda o:[o.c2] if o.c2_id else [], backward_cache_name="current_order_ancestors", ), ), Prefetch( f"{prefix}s__ps", queryset=( P.objects.filter(c_id=c_id) if c_id else P.objects.all() ), to_attr="needed_ps", filter_callback=lambda p: hasattr(p, "_prefetched_objects_cache") and p._prefetched_objects_cache.get("current_order_ancestors") and any( map( lambda o: o.c2_id is not None, p._prefetched_objects_cache.get( "current_o_ancestors" ).values(), ) ), post_prefetch_callback=ventilate_ps_by_c_id, ), Prefetch( f"{prefix}c2__u", queryset=C2U.objects.filter(p2_id__isnull=False) .distinct("c2_id") .prefetch_related( "p2", Prefetch( f"p2__ps", queryset=( P.objects.filter(c_id=c_id) if c_id else P.objects.all() ), to_attr="needed_ps", post_prefetch_callback=ventilate_ps_by_c_id, ), ), to_attr="needed_u", filter_callback=lambda c: ( hasattr(c2, "_prefetched_objects_cache") and c2._prefetched_objects_cache.get("current_o_ancestors") and any( map( lambda o: o.c2_id is not None and o.s_id is None, c._prefetched_objects_cache.get( "current_o_ancestors" ).values(), ) ) ), ), ) class DModelSerializer(...): ... @staticmethod def enhance_queryset( query_set, prefix: str = "", c_id: Optional[CId] = None, ): query_set = SomeMixin2.enhance_queryset( query_set, f"{prefix}l__", ) query_set = OModelSerializer.enhance_queryset( query_set, f"{prefix}l__", c_id=c_id, ) query_set = query_set.prefetch_related( f"{prefix}l__s2", f"{prefix}l__p3", f"{prefix}l2__s3__u", Prefetch( f"{prefix}l__r1", queryset=R1.objects.filter( d_id=F("o__s2__d_id"), r2__s4=SOME_CONSTANT, ).select_related("r2"), to_attr="pertinent_r3", ), ) return query_set
I have been able to make a very fast proof of concept right now in Django shell.
Here is the redacted transcript. I will prepare a PR :)