#25375 closed Bug (needsinfo)
Paginator generates suboptimal queries
Reported by: | Alexandru Mărășteanu | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.7 |
Severity: | Normal | Keywords: | |
Cc: | Alexandru Mărășteanu | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hello,
I'm using Django Rest Framework and have the following situation:
I have two models, Article
and Comment
:
class Article(models.Model): # ... class Comment(models.Model): article = models.ForeignKey(Article, related_name='comments') # ...
And an articles viewset:
class Articles(viewsets.ModelViewSet): model = Article serializer_class = ArticleSerializer paginate_by = 50 paginate_by_param = 'limit' def get_queryset(self): return Article.objects.annotate(comments_count=Count('comments'))
When doing a simple listing, the database will be queried two times: one to count the result set, the other to fetch the results:
SELECT COUNT(*) FROM ( SELECT `article`.`id` AS `id`, COUNT(`article_comment`.`id`) AS `comments_count` FROM `article` LEFT OUTER JOIN `article_comment` ON (`article`.`id` = `article_comment`.`article_id`) GROUP BY `article`.`id` ORDER BY NULL ) subquery;
And
SELECT `article`.`id` AS `id`, COUNT(`article_comment`.`id`) AS `comments_count` FROM `article` LEFT OUTER JOIN `article_comment` ON (`article`.`id` = `article_comment`.`article_id`) GROUP BY `article`.`id` ORDER BY `article`.`id` DESC LIMIT 50;
respectively.
If you look at (and EXPLAIN
) the first query, the one that does the counting, you'll notice it loads up *all* records and then starts counting them, a process which may take up all resources and eventually crash the machine. It did so in my case, with only 50000 records.
I figured this has something to do with Django, not DRF.
Change History (9)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
Wonder if this can be related to the changes that caused #25230?
@alexei, are you using MySQL?
comment:3 by , 9 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:4 by , 9 years ago
I think DRF is probably mutating your QuerySet to use a subquery to get the count when it's not necessary. If you are using MySQL, these tend to be slow - newer versions of MySQL and MariaDB have improvements that make them faster.
comment:5 by , 9 years ago
Cc: | added |
---|---|
Resolution: | needsinfo |
Status: | closed → new |
Hello,
Sorry for the delay. I didn't get notified when you guys replied, and I completely forgot about it.
Yes, in my previous example I was using MySQL.
However, I created a test app to see how it goes using the default settings (including SQLite db). The result is as follows:
$ python manage.py shell >>> from django.core.paginator import Paginator >>> from django.db import connections >>> from django.db.models import Count >>> from bug.models import Article >>> a = Article.objects.annotate(comments_count=Count('comments')) >>> p = Paginator(a, 25) >>> print p.object_list [] >>> print connections['default'].queries [{u'time': u'0.001', u'sql': u'QUERY = u\'SELECT "bug_article"."id", "bug_article"."title", "bug_article"."body", COUNT("bug_comment"."id") AS "comments_count" FROM "bug_article" LEFT OUTER JOIN "bug_comment" ON ( "bug_article"."id" = "bug_comment"."article_id" ) GROUP BY "bug_article"."id", "bug_article"."title", "bug_article"."body" LIMIT 21\' - PARAMS = ()'}] >>> print p.count 0 >>> print connections['default'].queries [{u'time': u'0.001', u'sql': u'QUERY = u\'SELECT "bug_article"."id", "bug_article"."title", "bug_article"."body", COUNT("bug_comment"."id") AS "comments_count" FROM "bug_article" LEFT OUTER JOIN "bug_comment" ON ( "bug_article"."id" = "bug_comment"."article_id" ) GROUP BY "bug_article"."id", "bug_article"."title", "bug_article"."body" LIMIT 21\' - PARAMS = ()'}, {u'time': u'0.000', u'sql': u'QUERY = u\'SELECT COUNT(*) FROM (SELECT "bug_article"."id" AS "id", "bug_article"."title" AS "title", "bug_article"."body" AS "body", COUNT("bug_comment"."id") AS "comments_count" FROM "bug_article" LEFT OUTER JOIN "bug_comment" ON ( "bug_article"."id" = "bug_comment"."article_id" ) GROUP BY "bug_article"."id", "bug_article"."title", "bug_article"."body") subquery\' - PARAMS = ()'}] >>>
This time I'm again on Django 1.7.1.
For a second look, I updated Django to 1.8.4. The result is the same.
As you can see, it's Paginator's count
that wraps the query resulting from the QuerySet in a subquery.
comment:6 by , 9 years ago
The queries in your latest comment are difficult to read given they are not formatted as nicely as the ticket description. Perhaps you could edit the description and move your latest comment there.
After that, could you explain what the expected query is? Are you able to propose a fix? Not knowing these particular internal details of Django offhand, it's a bit difficult for me to triage the ticket otherwise. Thanks!
comment:7 by , 9 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:8 by , 8 years ago
The Paginator needs to make two queries: the count query and the query for the page objects.
If the queryset you use in your Paginator satisfies one of these conditions the count query will run as a subquery, so this query:
SELECT COUNT(*) FROM (SELECT ...);
instead of
SELECT COUNT(*) AS "__count" FROM ...;
In my case the first query takes double the amount of time of the second query.
The row count can be independent of annotations. So if you make the Paginator do the count query on a queryset without annotations (resulting in a more performant count query) and then doing the actual pagination with the annotated queryset may result in a small performance win.
Annotations can influence row count (e.g. when filtering by annotation) and you still may have groupby()
or distinct()
on your queryset that influences counts and forces a count with a subquery. So I'm not sure if there's a general way for the Paginator (or any .count()
call) to run the more performant query.
Could you please add a test case that demonstrates the problem without DRF?