Opened 3 years ago

Last modified 3 years ago

#23771 new Cleanup/optimization

Optimisation idea for Paginator object

Reported by: GP89 Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords: paginator optimization performance
Cc: mail@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I had a thought - I don't know if this can be easily achieved but here it is.

I have a view which uses the Paginator object to page results for a table, one of the columns of the tables is a total gathered from summing rows from 5 different tables, so the query has 5 joins to get these values.

run_objects = Run.objects.filter(type__name=run_type).select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\
    annotate(total_griderror=Count("griderrorrecord", distinct=True),
             total_gridhttperror=Count("gridhttperrorrecord", distinct=True),
             total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True),
             total_storagehttperror=Count("storagehttperrorrecord", distinct=True),
             total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id")

paginator = Paginator(runs_objects, 15)

The Paginator object appears to do a Count query, to calculate the total number of pages and then a select query later when a page number is requested to get the data. The problem I'm seeing is that the count query in this case is including the 5 joins which don't affect the count other than making it a lot slower.

I was able to speed this up by performing the count for the Paginator myself

run_objects = Run.objects.filter(type__name=run_type)
total = run_objects.count()

run_objects = run_objects.select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\
    annotate(total_griderror=Count("griderrorrecord", distinct=True),
             total_gridhttperror=Count("gridhttperrorrecord", distinct=True),
             total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True),
             total_storagehttperror=Count("storagehttperrorrecord", distinct=True),
             total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id")

paginator = Paginator(runs_objects, 15)
paginator._count = total

I was wondering - would it be possible to do this automatically. Is there a way to drop bits off the query for the count query (ie like annotations, select_related, prefetches which would have no effect on the number of rows returned) to speed up things up?

Change History (3)

comment:1 Changed 3 years ago by Michael Manfre

Keywords: paginator optimization performance added
Triage Stage: UnreviewedAccepted

There are probably some non-complex gains to be had, but if all else fails it would be worth documenting the potential performance hit with an optimization example.

comment:2 Changed 3 years ago by Danilo Bargen

Cc: mail@… added

This sounds like a nice idea. I don't think we should try to do this automatically though -- it's probably quite hard and the developer probably knows better how to optimize a query. Maybe we could add an additional object_count kwarg in the constructor that's specifically documented for optimization purposes?

comment:3 Changed 3 years ago by Adam (Chainz) Johnson

For the example queryset, I think just using prefetch_related will turn the joins into extra queries and they won't then appear in the count query.

Note: See TracTickets for help on using tickets.
Back to Top