Opened 10 years ago
Last modified 4 years ago
#23771 new Cleanup/optimization
Optimisation idea for Paginator object
Reported by: | GP89 | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.6 |
Severity: | Normal | Keywords: | paginator optimization performance |
Cc: | mail@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I had a thought - I don't know if this can be easily achieved but here it is.
I have a view which uses the Paginator
object to page results for a table, one of the columns of the tables is a total gathered from summing rows from 5 different tables, so the query has 5 joins to get these values.
run_objects = Run.objects.filter(type__name=run_type).select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\ annotate(total_griderror=Count("griderrorrecord", distinct=True), total_gridhttperror=Count("gridhttperrorrecord", distinct=True), total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True), total_storagehttperror=Count("storagehttperrorrecord", distinct=True), total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id") paginator = Paginator(runs_objects, 15)
The Paginator
object appears to do a Count query, to calculate the total number of pages and then a select query later when a page number is requested to get the data. The problem I'm seeing is that the count query in this case is including the 5 joins which don't affect the count other than making it a lot slower.
I was able to speed this up by performing the count for the Paginator
myself
run_objects = Run.objects.filter(type__name=run_type) total = run_objects.count() run_objects = run_objects.select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\ annotate(total_griderror=Count("griderrorrecord", distinct=True), total_gridhttperror=Count("gridhttperrorrecord", distinct=True), total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True), total_storagehttperror=Count("storagehttperrorrecord", distinct=True), total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id") paginator = Paginator(runs_objects, 15) paginator._count = total
I was wondering - would it be possible to do this automatically. Is there a way to drop bits off the query for the count query (ie like annotations, select_related, prefetches which would have no effect on the number of rows returned) to speed up things up?
Change History (7)
comment:1 by , 10 years ago
Keywords: | paginator optimization performance added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 10 years ago
Cc: | added |
---|
This sounds like a nice idea. I don't think we should try to do this automatically though -- it's probably quite hard and the developer probably knows better how to optimize a query. Maybe we could add an additional object_count
kwarg in the constructor that's specifically documented for optimization purposes?
comment:3 by , 10 years ago
For the example queryset, I think just using prefetch_related
will turn the joins into extra queries and they won't then appear in the count query.
comment:5 by , 6 years ago
#28477 is related, although it's has a more general scope (it's about the Queryset.count()
method instead of just the Paginator) and proposes to work in reduction of unused annotations.
comment:6 by , 4 years ago
Adding my +1 to this, as it makes it extremely difficult to optimise readonly annotations in the admin. As an example:
class MyAdmin(admin.ModelAdmin): list_display = ("f1", "f2", "related_count") def related_count(self, obj): return obj.related_count def get_queryset(self, request): return super().get_queryset(request).annotate(related_count=Count("related"))
If QuerySet had a way to remove annotations like select_related(None) or order_by(), then we could implement a Paginator like so:
class BetterCountPaginator(Paginator): @cached_property def count(self): return self.object_list.annotate(None).count()
comment:7 by , 4 years ago
Doing this automatically and correctly is really tricky as #28477 proved (think of annotations spanning multi-valued relationships, values
grouping by annotations before aggregation) so I'd be +1 on adding a way to pass count
or total
to Paginator
and a way to clear annotations but we'd have to document that the later is not panacea.
There are probably some non-complex gains to be had, but if all else fails it would be worth documenting the potential performance hit with an optimization example.