Opened 10 years ago

Last modified 4 years ago

#23771 new Cleanup/optimization

Optimisation idea for Paginator object

Reported by: GP89 Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords: paginator optimization performance
Cc: mail@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no
Pull Requests:How to create a pull request

Description

I had a thought - I don't know if this can be easily achieved but here it is.

I have a view which uses the Paginator object to page results for a table, one of the columns of the tables is a total gathered from summing rows from 5 different tables, so the query has 5 joins to get these values.

run_objects = Run.objects.filter(type__name=run_type).select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\
    annotate(total_griderror=Count("griderrorrecord", distinct=True),
             total_gridhttperror=Count("gridhttperrorrecord", distinct=True),
             total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True),
             total_storagehttperror=Count("storagehttperrorrecord", distinct=True),
             total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id")

paginator = Paginator(runs_objects, 15)

The Paginator object appears to do a Count query, to calculate the total number of pages and then a select query later when a page number is requested to get the data. The problem I'm seeing is that the count query in this case is including the 5 joins which don't affect the count other than making it a lot slower.

I was able to speed this up by performing the count for the Paginator myself

run_objects = Run.objects.filter(type__name=run_type)
total = run_objects.count()

run_objects = run_objects.select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\
    annotate(total_griderror=Count("griderrorrecord", distinct=True),
             total_gridhttperror=Count("gridhttperrorrecord", distinct=True),
             total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True),
             total_storagehttperror=Count("storagehttperrorrecord", distinct=True),
             total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id")

paginator = Paginator(runs_objects, 15)
paginator._count = total

I was wondering - would it be possible to do this automatically. Is there a way to drop bits off the query for the count query (ie like annotations, select_related, prefetches which would have no effect on the number of rows returned) to speed up things up?

According to the ticket's flags, the next step(s) to move this issue forward are:

  • To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is: [https://github.com/django/django/pull/#### PR].

Change History (7)

comment:1 by Michael Manfre, 10 years ago

Keywords: paginator optimization performance added
Triage Stage: UnreviewedAccepted

There are probably some non-complex gains to be had, but if all else fails it would be worth documenting the potential performance hit with an optimization example.

comment:2 by Danilo Bargen, 10 years ago

Cc: mail@… added

This sounds like a nice idea. I don't think we should try to do this automatically though -- it's probably quite hard and the developer probably knows better how to optimize a query. Maybe we could add an additional object_count kwarg in the constructor that's specifically documented for optimization purposes?

comment:3 by Adam Johnson, 10 years ago

For the example queryset, I think just using prefetch_related will turn the joins into extra queries and they won't then appear in the count query.

comment:4 by Ramiro Morales, 7 years ago

#25375 and #29254 were related.

comment:5 by Ramiro Morales, 7 years ago

#28477 is related, although it's has a more general scope (it's about the Queryset.count() method instead of just the Paginator) and proposes to work in reduction of unused annotations.

comment:6 by Josh Smeaton, 4 years ago

Adding my +1 to this, as it makes it extremely difficult to optimise readonly annotations in the admin. As an example:

class MyAdmin(admin.ModelAdmin):
    list_display = ("f1", "f2", "related_count")

    def related_count(self, obj):
        return obj.related_count
    
    def get_queryset(self, request):
        return super().get_queryset(request).annotate(related_count=Count("related"))

If QuerySet had a way to remove annotations like select_related(None) or order_by(), then we could implement a Paginator like so:

class BetterCountPaginator(Paginator):
    @cached_property
    def count(self):
        return self.object_list.annotate(None).count()

comment:7 by Simon Charette, 4 years ago

Doing this automatically and correctly is really tricky as #28477 proved (think of annotations spanning multi-valued relationships, values grouping by annotations before aggregation) so I'd be +1 on adding a way to pass count or total to Paginator and a way to clear annotations but we'd have to document that the later is not panacea.

Note: See TracTickets for help on using tickets.
Back to Top