Opened 11 years ago

Closed 4 years ago

Last modified 4 years ago

#20577 closed Cleanup/optimization (fixed)

Make prefetch_related faster by lazily creating related querysets

Reported by: Anssi Kääriäinen Owned by: Alex Aktsipetrov
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: karyon Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


In one project of mine I will need to prefetch the following things for each "machine": computerdata__operatingsystem, identifiers. computerdata is one-to-one to machine, operatingsystem is manytomany, and identifiers are many-to-one. The data is distributed in a way that any one machine have on average one operating system, and a couple of identifiers.

Fetching data results in this profile:

        1    0.000    0.000    6.835    6.835<module>)
        1    0.000    0.000    6.795    6.795
        1    0.000    0.000    6.795    6.795
        1    0.000    0.000    6.207    6.207
        1    0.000    0.000    6.199    6.199
        1    0.000    0.000    6.072    6.072
     10/2    0.016    0.002    6.069    3.034
      6/1    0.000    0.000    6.043    6.043
        1    0.000    0.000    5.837    5.837
        1    0.009    0.009    5.837    5.837
        3    0.080    0.027    5.819    1.940
     4640    0.018    0.000    3.917    0.001
     4646    0.014    0.000    3.206    0.001
     4648    0.037    0.000    3.193    0.001
     4648    0.031    0.000    2.661    0.001
     4648    0.053    0.000    2.401    0.001
     4648    0.144    0.000    2.284    0.000
     2320    0.040    0.000    2.076    0.001
     2320    0.063    0.000    1.823    0.001
    14380    0.068    0.000    1.052    0.000
        1    0.023    0.023    0.993    0.993
     9299    0.067    0.000    0.841    0.000
     4649    0.086    0.000    0.752    0.000
     9299    0.226    0.000    0.738    0.000
     4644    0.177    0.000    0.668    0.000
        1    0.000    0.000    0.577    0.577
    14375    0.330    0.000    0.548    0.000
   127/79    0.007    0.000    0.447    0.006 {__import__}
     4645    0.012    0.000    0.443    0.000
    14380    0.017    0.000    0.433    0.000
        5    0.197    0.039    0.197    0.039 {method 'execute' of 'psycopg2._psycopg.cursor' objects}

If I am reading this correctly, the actual data fetching costs 0.2 seconds of the total runtime of 6.8 seconds. (In reality the ratio is 0.2 vs 2 seconds due to overhead of profiling not affecting the execute time but having a big effect on other parts).

The big "failure" in above profile is the creation of related querysets:

4640 0.018 0.000 3.917 0.001

this takes more than half (approx 57%) of the total runtime. Every cycle here is wasted - we don't ever need the related queryset when using the prefetched results.

I see two options here:

  • Allow assigning the results to somewhere else than manager.all() (that is, a plain list you can name). This would naturally get rid of the overhead, but then you will need to alter code to explicitly use the named list when that is available.
  • Somehow lazily instantiate the .all() queryset. If prefetch is in effect calling relmanager.all() will not create a queryset, it just creates a proxy which when iterated gives the related objects, but works otherwise like the related queryset (*not* like manager).

I prefer option 2 as this doesn't require any new APIs or changes to user code to take advantage of this feature. However creating a proxy object that works like the related queryset except for iteration, and which doesn't involve actually creating that queryset will likely be an interesting technical challenge (for example it would be nice to ensure isinstance(obj.operating_systems.all(), QuerySet) == True. Solving this challenge will likely speed up some prefetch_related queries by 50% or so.

Change History (13)

comment:1 by Anssi Kääriäinen, 11 years ago

I implemented a branch where it is possible to chain operations. The operations are executed on next queryset access. See The branch isn't meant to be committable, and it fails a couple of tests (one reason might be that cloning does a bit more than just clone(), so when chaining ops without clone things will break).

The basic idea is to add a new QuerySet method chain_ops(). The usage is simple: qs.chain_ops(lambda qs: qs.filter(foo=bar).using(baz).order_by(xyzzy)). The chain_ops() method will *not* clone the queryset. The operations in the lambda function are executed when needed, that is on queryset evaluation, accessing qs.query, or cloning the queryset. When the operations in the lambda function are executed, the queryset isn't cloned in between.

The result of this is 2.3x speedup in query_prefetch_related djangobench benchmark.

comment:2 by Luke Plant, 11 years ago

Nice work!

I'm wondering if there is some overlap with _sticky_filter (or if that could be implemented using chain_ops).

comment:3 by Anssi Kääriäinen, 11 years ago

Has patch: set
Patch needs improvement: set

The _sticky_filter is related, but I don't see any immediate way to implement that with chain_ops. The _sticky_filter tells self.query to not reset reusable aliases when cloning so that two .filter() calls in row target the same multijoin alias (or, this is how I recall). There are a couple of other cases where clone() resets some state to start next operation. The chain_ops() will need a way to tell self.query to reset state after each operation without doing a clone, while _sticky_filter needs a way to not reset state while doing a clone.

The Query.clone() is responsible for even changing the class of the Query sometimes. It might be better to separate the responsibilities a bit, so that .clone() does a full clone. Then there is a .start_op() or something like that that does the state cleanup & class copying for starting a new operation. I am not sure how this interacts with all the Query subclasses, or if this will have noticeable performance impact.

I am not sure if chain_ops() is needed API after all. For user API the wanted behaviour is to first do a clone, then run all the operations without cloning between and finally return the result. chain_ops() doesn't clone at all, and delays the operations until next queryset operation. As is, chain_ops() is very useful for prefetch_related but completely wrong as user API. So, maybe there needs to be some internal setup for prefetch_related delayed execution. The same setup can then be used for chain_ops() too if need be.

comment:4 by Anssi Kääriäinen, 11 years ago

#19264 was a duplicate. In addition #20880 (splitted clone()) tracks splitting clone() to pre_next_op() and clone() parts. That seems to be a pre-requisite for the chain_ops() approach.

comment:5 by Anssi Kääriäinen, 11 years ago

The approach in #17001 (prefetch custom querysets) will provide the speedup promised in this ticket. Unfortunately the speedup is only available to custom prefetches, not ordinary .prefetch_related('some_related_set').

I am going to do a wait-and-see on this ticket, changes required for fast prefetches directly into querysets require some hacks. The hacks needed: create QuerySet.query lazily, and have a "on_query_creation_ops" QuerySet attribute, that is things that are ran when the inner query is created.

I am going to leave this in accepted status - if a suitably clean patch is written I think speeding up prefetch_related is a good idea.

comment:6 by karyon, 9 years ago

fyi, i can still observe this behavior. the profile looks pretty similar. using the Prefetch object didn't change anything.

comment:7 by karyon, 9 years ago

Cc: karyon added

comment:8 by Tom Linford, 9 years ago

Echoing karyon here -- We're seeing pretty significant performance hits for our django rest api on endpoints that return objects that need to have access to their child objects.

comment:9 by Simon Charette, 5 years ago

Patch needs improvement: unset

The patch should be RFC pending some minimal adjustments.

comment:10 by Mariusz Felisiak, 4 years ago

Owner: changed from nobody to Alex Aktsipetrov
Status: newassigned

comment:11 by Mariusz Felisiak, 4 years ago

Triage Stage: AcceptedReady for checkin

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 681f7e2:

Fixed #20577 -- Deferred filtering of prefetched related querysets.

Added internal interface to QuerySet that allows to defer next filter
call till .query is accessed. Used it to optimize prefetch_related().

Thanks Simon Charette for the review.

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 036f160:

Refs #20577 -- Deferred filtering of prefetched related querysets by reverse m2o relation.

Note: See TracTickets for help on using tickets.
Back to Top