Opened 10 years ago
Last modified 4 years ago
#23771 new Cleanup/optimization
Optimisation idea for Paginator object
Reported by: | GP89 | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.6 |
Severity: | Normal | Keywords: | paginator optimization performance |
Cc: | mail@… | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Pull Requests: | How to create a pull request | ||
Description ¶
I had a thought - I don't know if this can be easily achieved but here it is.
I have a view which uses the Paginator
object to page results for a table, one of the columns of the tables is a total gathered from summing rows from 5 different tables, so the query has 5 joins to get these values.
run_objects = Run.objects.filter(type__name=run_type).select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\ annotate(total_griderror=Count("griderrorrecord", distinct=True), total_gridhttperror=Count("gridhttperrorrecord", distinct=True), total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True), total_storagehttperror=Count("storagehttperrorrecord", distinct=True), total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id") paginator = Paginator(runs_objects, 15)
The Paginator
object appears to do a Count query, to calculate the total number of pages and then a select query later when a page number is requested to get the data. The problem I'm seeing is that the count query in this case is including the 5 joins which don't affect the count other than making it a lot slower.
I was able to speed this up by performing the count for the Paginator
myself
run_objects = Run.objects.filter(type__name=run_type) total = run_objects.count() run_objects = run_objects.select_related("runend", "browserlink__browser", "filecheck__filecheckrecord").\ annotate(total_griderror=Count("griderrorrecord", distinct=True), total_gridhttperror=Count("gridhttperrorrecord", distinct=True), total_gridconnerror=Count("gridconnectionerrorrecord", distinct=True), total_storagehttperror=Count("storagehttperrorrecord", distinct=True), total_storageconnerror=Count("storageconnectionerrorrecord", distinct=True)).order_by("-id") paginator = Paginator(runs_objects, 15) paginator._count = total
I was wondering - would it be possible to do this automatically. Is there a way to drop bits off the query for the count query (ie like annotations, select_related, prefetches which would have no effect on the number of rows returned) to speed up things up?
According to the ticket's flags, the next step(s) to move this issue forward are:
- To provide a patch by sending a pull request. Claim the ticket when you start working so that someone else doesn't duplicate effort. Before sending a pull request, review your work against the patch review checklist. Check the "Has patch" flag on the ticket after sending a pull request and include a link to the pull request in the ticket comment when making that update. The usual format is:
[https://github.com/django/django/pull/#### PR]
.
Change History (7)
comment:1 by , 10 years ago
Keywords: | paginator optimization performance added |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:2 by , 10 years ago
Cc: | added |
---|
This sounds like a nice idea. I don't think we should try to do this automatically though -- it's probably quite hard and the developer probably knows better how to optimize a query. Maybe we could add an additional object_count
kwarg in the constructor that's specifically documented for optimization purposes?
comment:3 by , 10 years ago
For the example queryset, I think just using prefetch_related
will turn the joins into extra queries and they won't then appear in the count query.
comment:5 by , 7 years ago
#28477 is related, although it's has a more general scope (it's about the Queryset.count()
method instead of just the Paginator) and proposes to work in reduction of unused annotations.
comment:6 by , 4 years ago
Adding my +1 to this, as it makes it extremely difficult to optimise readonly annotations in the admin. As an example:
class MyAdmin(admin.ModelAdmin): list_display = ("f1", "f2", "related_count") def related_count(self, obj): return obj.related_count def get_queryset(self, request): return super().get_queryset(request).annotate(related_count=Count("related"))
If QuerySet had a way to remove annotations like select_related(None) or order_by(), then we could implement a Paginator like so:
class BetterCountPaginator(Paginator): @cached_property def count(self): return self.object_list.annotate(None).count()
comment:7 by , 4 years ago
Doing this automatically and correctly is really tricky as #28477 proved (think of annotations spanning multi-valued relationships, values
grouping by annotations before aggregation) so I'd be +1 on adding a way to pass count
or total
to Paginator
and a way to clear annotations but we'd have to document that the later is not panacea.
There are probably some non-complex gains to be had, but if all else fails it would be worth documenting the potential performance hit with an optimization example.