Opened 10 years ago

Last modified 4 years ago

#23771 new Cleanup/optimization

Optimisation idea for Paginator object

Reported by: GP89 Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords: paginator optimization performance
Cc: mail@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I had a thought - I don't know if this can be easily achieved but here it is.

I have a view which uses the Paginator object to page results for a table, one of the columns of the tables is a total gathered from summing rows from 5 different tables, so the query has 5 joins to get these values.

run_objects = Run.objects.filter(type__name=run_type).select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\
    annotate(total_griderror=Count("griderrorrecord", distinct=True),
             total_gridhttperror=Count("gridhttperrorrecord", distinct=True),
             total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True),
             total_storagehttperror=Count("storagehttperrorrecord", distinct=True),
             total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id")

paginator = Paginator(runs_objects, 15)

The Paginator object appears to do a Count query, to calculate the total number of pages and then a select query later when a page number is requested to get the data. The problem I'm seeing is that the count query in this case is including the 5 joins which don't affect the count other than making it a lot slower.

I was able to speed this up by performing the count for the Paginator myself

run_objects = Run.objects.filter(type__name=run_type)
total = run_objects.count()

run_objects = run_objects.select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\
    annotate(total_griderror=Count("griderrorrecord", distinct=True),
             total_gridhttperror=Count("gridhttperrorrecord", distinct=True),
             total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True),
             total_storagehttperror=Count("storagehttperrorrecord", distinct=True),
             total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id")

paginator = Paginator(runs_objects, 15)
paginator._count = total

I was wondering - would it be possible to do this automatically. Is there a way to drop bits off the query for the count query (ie like annotations, select_related, prefetches which would have no effect on the number of rows returned) to speed up things up?

Change History (7)

comment:1 by Michael Manfre, 10 years ago

Keywords: paginator optimization performance added
Triage Stage: UnreviewedAccepted

There are probably some non-complex gains to be had, but if all else fails it would be worth documenting the potential performance hit with an optimization example.

comment:2 by Danilo Bargen, 10 years ago

Cc: mail@… added

This sounds like a nice idea. I don't think we should try to do this automatically though -- it's probably quite hard and the developer probably knows better how to optimize a query. Maybe we could add an additional object_count kwarg in the constructor that's specifically documented for optimization purposes?

comment:3 by Adam Johnson, 10 years ago

For the example queryset, I think just using prefetch_related will turn the joins into extra queries and they won't then appear in the count query.

comment:4 by Ramiro Morales, 6 years ago

#25375 and #29254 were related.

comment:5 by Ramiro Morales, 6 years ago

#28477 is related, although it's has a more general scope (it's about the Queryset.count() method instead of just the Paginator) and proposes to work in reduction of unused annotations.

comment:6 by Josh Smeaton, 4 years ago

Adding my +1 to this, as it makes it extremely difficult to optimise readonly annotations in the admin. As an example:

class MyAdmin(admin.ModelAdmin):
    list_display = ("f1", "f2", "related_count")

    def related_count(self, obj):
        return obj.related_count
    
    def get_queryset(self, request):
        return super().get_queryset(request).annotate(related_count=Count("related"))

If QuerySet had a way to remove annotations like select_related(None) or order_by(), then we could implement a Paginator like so:

class BetterCountPaginator(Paginator):
    @cached_property
    def count(self):
        return self.object_list.annotate(None).count()

comment:7 by Simon Charette, 4 years ago

Doing this automatically and correctly is really tricky as #28477 proved (think of annotations spanning multi-valued relationships, values grouping by annotations before aggregation) so I'd be +1 on adding a way to pass count or total to Paginator and a way to clear annotations but we'd have to document that the later is not panacea.

Note: See TracTickets for help on using tickets.
Back to Top