Opened 9 months ago

Closed 9 months ago

Last modified 9 months ago

#35317 closed New feature (wontfix)

Add the possibility to do prefetches for only a subset of instances

Reported by: Laurent Lyaudet Owned by: nobody
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords:
Cc: Laurent Lyaudet Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Laurent Lyaudet)

Hello,

I have use cases with serializers returning a lot of distinct informations.
For example, my serialiser can be called with many=True and a query of 2000 Order instances.
And for a small part of these 2000 instances, I need very expensive prefetches.
Currently, I deal with that by overloading the _fetch_all() method and make by hand the hyper-specifics and expensive prefetches,
after everything else.
What I propose is to add a filter_callback kwarg to Prefetch object.
filter_callback would be either None (default) or a function taking only one instance of the previous level of prefetch
and return a boolean to know if the instance needs that prefetch.
A simple example is when you will filter on some status field of Order,
but in that case, you can just modify the query somehow.
And unfortunately, the order id would still be injected in the prefetch query.
But for more realistic examples, you do not want to recode the business logic
in the SQL query to know what needs the prefetch or not.

Example :

orders = (
    Order.objects.filter(...)
    .exclude(...)
    .annotate(...)
    .prefetch_related(
        "items",
        "address",
        Prefetch(
            "packing_task__zonings",
            queryset=Zonings.objects.filter(...), 
            filter_callback=lambda packing_task: packing_task.order.needs_to_consider_packing_problematic_zonings(),
            to="problematic_zonings",
        )
    )
)

Best regards,

Laurent Lyaudet

Change History (7)

comment:1 by Laurent Lyaudet, 9 months ago

Description: modified (diff)

comment:2 by Laurent Lyaudet, 9 months ago

I have been able to make a very fast proof of concept right now in Django shell.
Here is the redacted transcript. I will prepare a PR :)

import random

from django.db import connection
from django.db.models import query


old_prefetch_one_level = query.prefetch_one_level
def new_prefetch_one_level(instances, prefetcher, lookup, level):
    if hasattr(lookup, "filter_callback") and lookup.filter_callback is not None:
        instances = [instance for instance in instances if lookup.filter_callback(instance)]
    if len(instances) == 0:
        return [], ()
    return old_prefetch_one_level(instances, prefetcher, lookup, level)


query.prefetch_one_level = new_prefetch_one_level


from DeleevCore.orders.models import Order, OrderProduct

from django.db.models import Prefetch


some_random_prefetch = Prefetch("items", queryset=OrderProduct.objects.all())
some_random_prefetch.filter_callback = lambda x: bool(random.randint(0, 1))

orders = list(Order.objects.all().prefetch_related(some_random_prefetch)[:10])

print(connection.queries)
[{'sql': "SELECT t.oid, typarray FROM pg_type t JOIN pg_namespace ns ON typnamespace = ns.oid WHERE typname = 'hstore'",
  'time': '0.001'},
 {'sql': "SELECT typarray FROM pg_type WHERE typname = 'citext'",
  'time': '0.000'},
 {'sql': 'SELECT @redacted@ FROM "orders_order" ORDER BY "orders_order"."shipping_time" ASC  LIMIT 10',
  'time': '0.019'},
 {'sql': 'SELECT @redacted@ FROM "orders_orderproduct" WHERE "orders_orderproduct"."order_id" IN (114407, 84181, 84183, 25369)',
  'time': '0.006'}]

Look only 4 ids in last query instead of 10 :)


comment:3 by Laurent Lyaudet, 9 months ago

Has patch: set
Last edited 9 months ago by Laurent Lyaudet (previous) (diff)

comment:4 by Simon Charette, 9 months ago

I'm not convinced we should add a feature for that given prefetch_related_objects is already a documented public API that allows for this feature to be built on top of

orders = (
    Order.objects.filter(...)
    .exclude(...)
    .annotate(...)
    .prefetch_related(
        "items",
        "address",
    )
)
predicate = lambda packing_task: order.needs_to_consider_packing_problematic_zonings()
problematic_zoning_orders = filter(predicate, orders)
prefetch_related_objects(
    problematic_zoning_orders,
    Prefetch(
        "packing_task__zonings",
        queryset=Zonings.objects.filter(...),
        to="problematic_zonings"
    ),
)

I'd a bit more trickier when dealing with predicates for nested prefetches but it's still doable.

Just like any other non-trivial feature addition like this one I suggest trying to gather consensus on a general need for this feature on the forum and / or developer mailing list. From providing support over this channels in the past years I don't remember anyone else running into a similar issue that couldn't be solved with prefetch_related_objects directly.

Version 1, edited 9 months ago by Simon Charette (previous) (next) (diff)

comment:5 by Mariusz Felisiak, 9 months ago

Resolution: wontfix
Status: newclosed

I agree with Simon. The ORM is already packaged with features, so we generally don't add new options for things that are already possible. Especially for niche use cases.

You can follow triaging guidelines with regards to wontfix tickets and take this to DevelopersMailingList, if you don't agree.

in reply to:  5 comment:6 by Laurent Lyaudet, 9 months ago

TLDR : Use https://pypi.org/project/django-monkey-patches/

Replying to Simon Charette:

I'm not convinced we should add a feature for that given prefetch_related_objects is already a documented public API that allows for this feature to be built on top of
I'd be a bit more trickier when dealing with predicates for nested prefetches but it's still doable.

What do you mean by a little bit trickier ?...
I'm using querysets with Django Rest Framework serializers.
I find it convenient to add a static method enhance_queryset() in each model serializer that adds the needed prefetches.
Moreover, when I have nested model serializers,
I call recursively the methods enhance_queryset() of the nested model serializers
in the method enhance_queryset() of the main model serializer.
If you use prefetch_related_objects() instead, either you do not inject querysets in your serializers,
or you have to deal with all of this in the init() method of your serializer.
Moreover, you have to gather the relevant objects by hand with nested loops to put them in prefetch_related_objects()...
whilst it is already dealed with with nested prefetches using my patch...
I think my solution is superior, but I will not lose too much time arguing.

Just like any other non-trivial feature addition like this one I suggest trying to gather consensus on a general need for this feature on the forum and / or developer mailing list. From providing support over this channels in the past years I don't remember anyone else running into a similar issue that couldn't be solved with prefetch_related_objects directly.

Sorry, if I do that it will split the discussion in many places;
and frankly, I already know how are recieved most of the feature requests on forums/mailing lists/whatever.
Bureaucracy is not for me.

Replying to Mariusz Felisiak:

I agree with Simon. The ORM is already packaged with features, so we generally don't add new options for things that are already possible. Especially for niche use cases.

You can follow triaging guidelines with regards to wontfix tickets and take this to DevelopersMailingList, if you don't agree.

Idem.

I don't see any way to upvote a feature request here. It would be a nice way to know what dev using Django wants.

comment:7 by Laurent Lyaudet, 9 months ago

I did a second PR https://github.com/django/django/pull/18003 adding a post_prefetch_callback argument to Prefetch.
(Note that the PR has a failed test on Jenkins for TimeOutNotRaised
and I don't see any link with my code.
"Normal tests" are fine.)
Here is a redacted production code example that I could develop with the corresponding patch in my package django-monkey-patches.
I doubt you can find an efficient way and requiring around the same amount of code
to do the same thing without these features.

class OModelSerializer(...):
    @staticmethod
    def enhance_queryset(
        query_set,
        prefix: str = "",
        c_id: Optional[CId] = None,
    ):
        query_set = SomeMixin1.enhance_queryset(query_set, prefix)

        def ventilate_ps_by_c_id(
            lookup,
            done_queries,
        ):
            prefetch_to = lookup.prefetch_to
            prefetch_before = get_previous_prefetch_to(prefetch_to)
            for obj in prefetch_before:
                if hasattr(obj, "needed_ps"):
                    if not hasattr(obj, "needed_p_by"):
                        obj.needed_p_by = {}
                    for obj2 in obj.needed_ps:
                        obj.needed_p_by[obj2.c_id] = obj2

        return query_set.prefetch_related(
            f"{prefix}p2",
            Prefetch(
                f"{prefix}s",
                post_prefetch_callback=create_post_prefetch_callback_add_backward_multiple(
                    retrieve_forward_cache_callback=lambda o: [o.s] if o.s_id else [],
                    backward_cache_name="current_o_ancestors",
                ),
            ),
            Prefetch(
                f"{prefix}c2",
                post_prefetch_callback=create_post_prefetch_callback_add_backward_multiple(
                    retrieve_forward_cache_callback=lambda o:[o.c2] if o.c2_id else [],
                    backward_cache_name="current_order_ancestors",
                ),
            ),
            Prefetch(
                f"{prefix}s__ps",
                queryset=(
                    P.objects.filter(c_id=c_id)
                    if c_id
                    else P.objects.all()
                ),
                to_attr="needed_ps",
                filter_callback=lambda p: hasattr(p, "_prefetched_objects_cache")
                and p._prefetched_objects_cache.get("current_order_ancestors")
                and any(
                    map(
                        lambda o: o.c2_id is not None,
                        p._prefetched_objects_cache.get(
                            "current_o_ancestors"
                        ).values(),
                    )
                ),
                post_prefetch_callback=ventilate_ps_by_c_id,
            ),
            Prefetch(
                f"{prefix}c2__u",
                queryset=C2U.objects.filter(p2_id__isnull=False)
                .distinct("c2_id")
                .prefetch_related(
                    "p2",
                    Prefetch(
                        f"p2__ps",
                        queryset=(
                            P.objects.filter(c_id=c_id)
                            if c_id
                            else P.objects.all()
                        ),
                        to_attr="needed_ps",
                        post_prefetch_callback=ventilate_ps_by_c_id,
                    ),
                ),
                to_attr="needed_u",
                filter_callback=lambda c: (
                    hasattr(c2, "_prefetched_objects_cache")
                    and c2._prefetched_objects_cache.get("current_o_ancestors")
                    and any(
                        map(
                            lambda o: o.c2_id is not None
                            and o.s_id is None,
                            c._prefetched_objects_cache.get(
                                "current_o_ancestors"
                            ).values(),
                        )
                    )
                ),
            ),
        )


class DModelSerializer(...):
    ...

    @staticmethod
    def enhance_queryset(
        query_set,
        prefix: str = "",
        c_id: Optional[CId] = None,
    ):
        query_set = SomeMixin2.enhance_queryset(
            query_set,
            f"{prefix}l__",
        )
        query_set = OModelSerializer.enhance_queryset(
            query_set,
            f"{prefix}l__",
            c_id=c_id,
        )
        query_set = query_set.prefetch_related(
            f"{prefix}l__s2",
            f"{prefix}l__p3",
            f"{prefix}l2__s3__u",
            Prefetch(
                f"{prefix}l__r1",
                queryset=R1.objects.filter(
                    d_id=F("o__s2__d_id"),
                    r2__s4=SOME_CONSTANT,
                ).select_related("r2"),
                to_attr="pertinent_r3",
            ),
        )

        return query_set
Last edited 9 months ago by Laurent Lyaudet (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top