Opened 16 years ago
Last modified 5 weeks ago
#8408 new New feature
Add a new meta option: don't do count(*) in admin
Reported by: | LI Daobing | Owned by: | Thomas C |
---|---|---|---|
Component: | contrib.admin | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Thomas Kerpe, Ludovico Magnocavallo, marcoberi@…, djfische@…, boxm@…, kmike84@…, slav0nic0@…, dan.fairs@…, markus.magnuson@…, t.chaumeny@…, josh.smeaton@…, Ionel Cristian Mărieș, Ondřej Chmelař, elonzh, Ülgen Sarıkavak | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | yes | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Hello,
everytime when I use Admin to view a table, the Admin count the rows in the table. if the database is mysql+innodb, this step is very slow, how about add an option in Meta to prevent this, such as
class A(models.Model): class Meta: do_not_count = True
thanks
Attachments (1)
Change History (62)
comment:1 by , 16 years ago
milestone: | → post-1.0 |
---|---|
Triage Stage: | Unreviewed → Design decision needed |
comment:2 by , 16 years ago
we have tens of millions rows in this table. and we want to use django Admin as a customizable viewer.
comment:3 by , 16 years ago
Triage Stage: | Design decision needed → Accepted |
---|
I like this optional optimization for people who don't want the admin to do a "select count(*)" on every changelist view, but I'm not sure whether we could implement it without throwing out some valuable parts of the changelist (mostly, the pagination). So, I'm marking this as "accepted," with the caveat that we need to look at what parts of the changelist will be thrown out.
comment:4 by , 16 years ago
This can be done without losing any functionality by setting a limit for count like this:
SELECT count(*) from (select id from HUGE_TABLE LIMIT 10000) as b;
Then if the raws count is more than for example 10000, SQL server stops sequential scanning through the table and paginatior can list pages around current page but not the last page. On my system postgreql takes 60 seconds to return count of a table with 11,000,000 raws and this is suboptimal to wait a minute for each admin page to load.
comment:6 by , 16 years ago
milestone: | → 1.2 |
---|
Having problems with large datasets in admin myself (not sure if count()
is the only reason though), proposing this for 1.2.
comment:7 by , 16 years ago
Cc: | added |
---|
comment:8 by , 16 years ago
Cc: | added |
---|
comment:9 by , 15 years ago
Cc: | added |
---|
comment:10 by , 15 years ago
The count(*) is essential for displaying the paginator. Without knowing the amount of entities of a table, there can be any well working paginator.
One solution could be to display only "prev/next"-buttons and input field for page.
But i think the effort of fixing this is much higher then the benefit
So -1 for this request.
comment:11 by , 15 years ago
aside from the count problem, LIMIT item_count OFFSET 25000 (maybe slower numbers too already) is damn slow in MySQL ;)
comment:12 by , 15 years ago
milestone: | 1.2 |
---|
1.2 is feature-frozen, moving this feature request off the milestone.
comment:14 by , 15 years ago
Cc: | added |
---|
comment:15 by , 14 years ago
Cc: | added |
---|
follow-up: 17 comment:16 by , 14 years ago
I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator
follow-up: 18 comment:17 by , 14 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
Replying to kmike:
I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator
Agreed :) Here's a link to a sample Paginator that doesn't use count
comment:18 by , 14 years ago
Cc: | added |
---|---|
Easy pickings: | unset |
Resolution: | wontfix |
Severity: | → Normal |
Status: | closed → reopened |
Type: | → Uncategorized |
Version: | 0.96 → SVN |
Replying to lrekucki:
Replying to kmike:
I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator
Agreed :) Here's a link to a sample Paginator that doesn't use count
Reopening, as the resolution simply doesn't work. The example paginator given by kmike simply doesn't work as a ModelAdmin paginator, because the {% pagination %} tag and the ChangeList class (among others) assume you can count the number of items being paginated. See my comment on #4065 for the gory details.
There are scenarios where you don't know how many results are being paginated, but can still do useful pagination links:
- With InnoDB tables in MySQL, count(*) is frighteningly expensive, so while in principle you can count, in practice you can't
- With NoSQL databases it can be impossible to do a count, but you can still paginate by range queries
A proposed design for such a paginator would be to display (on page 12):
1 2 3 ..... 10 11 12 ... Next
which is better than just next/prev.
I'll see if I can create acceptable patches.
by , 14 years ago
Attachment: | paginator-patch.diff added |
---|
Patch admin classes to add UncountedPaginator
comment:19 by , 14 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
Needs tests: | set |
Triage Stage: | Accepted → Design decision needed |
OK, here's an attempt at a patch that fixes this.
I've added a new UncountedPaginator to pagination.py, and updated the admin code to check whether the paginator has the appropriate attributes before using them, falling back on alternative implementations if not.
Before I create test & doc updates, would like some feedback on whether this patch is going about things the right way and is something that might be accepted.
comment:20 by , 14 years ago
Type: | Uncategorized → New feature |
---|---|
UI/UX: | unset |
comment:21 by , 13 years ago
This is not just MySQL. PostgreSQL has a separate query for cheap, inexact counts (with statistics collected during the most recent ANALYZE) -- but exact counts are expensive; I have a table where a SELECT COUNT(*)
is typically running upward of 50 seconds.
Perhaps on PostgreSQL we could run a query like select reltuples from pg_class where relname=?;
, and use the traditional pagination system if the result is under 20,000 or so, or a newer one otherwise?
comment:22 by , 13 years ago
Cc: | added |
---|
comment:23 by , 13 years ago
Triage Stage: | Design decision needed → Accepted |
---|
Switching back to 'accepted' - I think the switch to DDN was in error and simply sent this back into the icebox even though it was previously accepted by a core dev and has a draft patch.
comment:24 by , 13 years ago
The issue is not restricted to pagination. Seems like the admin interface calls count(*) at several other places.
comment:25 by , 13 years ago
Cc: | removed |
---|
comment:26 by , 13 years ago
Apparently count(*) is called on the whole table even if you filter your view. For instance if you add "?idin=555,666", you'll have two select count(*): one with the "where" clause and a plain one which runs on the whole table.
comment:27 by , 13 years ago
Can you track down where the unrestricted qs.count() is issued, and if there is a reason for it?
comment:28 by , 13 years ago
The admin displays the number of matching objects and the total number of objects.
comment:29 by , 12 years ago
Cc: | added |
---|
comment:30 by , 12 years ago
Cc: | added |
---|
comment:31 by , 12 years ago
+1 because I ran into this problem today.
Using PostgreSQL and django 1.4 with a table that contains about 15 million rows. executing "select count(*) from mytable" in the psql console takes about 40 seconds, even with an index on the primary key. This leads to timeouts and huge lag.
I'm going to try MySQL to see if this is any better.
comment:32 by , 12 years ago
Status: | reopened → new |
---|
comment:33 by , 12 years ago
Using a table of 130M+ rows and thus obviously running into the same problem. The COUNT(*) query takes literally 10+ minutes to run in InnoDB-land.
comment:34 by , 11 years ago
comment:35 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
Summary: | add a new meta option: don't do count(*) in admin → Add a new meta option: don't do count(*) in admin |
comment:36 by , 11 years ago
I'm working on this feature in my branch here: https://github.com/oinopion/django/compare/nocountpagination
comment:37 by , 11 years ago
Cc: | added |
---|
comment:38 by , 11 years ago
+1 for this feature. I have a few databases with tables containing tens of millions of records, and browsing them in Django is painfully slow, even with caching and extensive indexing. These sections of admin takes *minutes* to load, and DDT is telling me that around 90% of the load time is just from a single "SELECT COUNT(*) ..." to display the total result count, which I don't care about.
Yes, both PostgreSQL and MySQL have tricks for approximate counts, but only for total tables sizes, which doesn't help once you begin filtering.
And can confirm that using InfinitePaginator from django-pagination and specifying a custom paginator=InfinitePaginator in your modeladmin does not work because admin still expects to receive a valid count, which causes InfinitePaginator to throw NotImplementedError exceptions.
comment:39 by , 11 years ago
See here for some interesting workarounds of Django's inefficient result count mechanism: http://craiglabenz.me/2013/06/12/how-i-made-django-admin-scale/
comment:40 by , 10 years ago
Cc: | added |
---|
comment:41 by , 10 years ago
Needs documentation: | unset |
---|---|
Needs tests: | unset |
comment:42 by , 10 years ago
Owner: | changed from | to
---|
comment:44 by , 10 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:45 by , 8 years ago
I don't think this ticket has been properly or fully fixed. The patch that landed only affects the display of filtered results. A count(*) is still executed every time you hit a page, which leads to timeouts when tables grow large.
I think the idea of using a custom paginator is probably a good one, but that could probably be controlled by an option in model admin or something. I'm not reopening this ticket until I've got some feedback first.
comment:46 by , 7 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
As per Josh's comment a count(*)
is still executed by the paginator when using the admin, even with show_full_result_count=False
. This takes multiple seconds when querying tens of millions of rows. You can hack around the count(*) query by implementing a custom Paginator with either an appromixiate or a fixed count.
from django.core.paginator import Paginator class FixedCountPaginator(Paginator): @property def count(self): return 1000 class MyModelAdmin(admin.ModelAdmin): show_full_result_count=False paginator = FixedCountPaginator
I'd like to fix this properly by either paginating without requesting the number of rows / pages or disabling pagination.
class MyModelAdmin(admin.ModelAdmin): show_full_result_count=False no_count = False
I've put up a pull request to demonstrate the changes required to achieve this. It changes the pagination widget in the admin to display [Prev, PAGE, Next]. Changing this to a show_pagination
field that disables pagination would also solve the count(*)
issue and may be more consistent / straight forward.
class MyModelAdmin(admin.ModelAdmin): show_full_result_count=False show_pagination = True
comment:47 by , 7 years ago
Cc: | added |
---|---|
Needs tests: | set |
Patch needs improvement: | set |
I spoke to Matthew at the pycon sprints and agreed that we should open this one back up if we had a PR to demonstrate. The PR needs some work still. I'll take this to the Mailing List.
comment:48 by , 7 years ago
Cc: | added |
---|
comment:49 by , 7 years ago
I've been using the following for a while:
class EstimatedCountPaginator(Paginator): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.object_list.count = self.count @cached_property def count(self): if self.object_list.query.where: return self.object_list.count() db_table = self.object_list.model._meta.db_table cursor = connections[self.object_list.db].cursor() cursor.execute("SELECT reltuples FROM pg_class WHERE relname = %s", (db_table, )) result = cursor.fetchone() if not result: return 0 return int(result[0])
At the very least I believe such a paginator should be available by default in django.contrib.postgres
.
But going beyond that, the problem is that users are not always able to modify the affected admin objects. So there should be some way of setting the *default* paginator for all admin objects, I think.
comment:50 by , 6 years ago
What's wrong with just always calling len(self.object_list)
?
Using this saves me from the additional count of the filtered objects. This seems to work as the queryset is evaluated/fetched and cached on the len(...)
call, after that another fetch is not needed to render the items in the admin list.
class LenCountPaginator(Paginator): @cached_property def count(self): return len(self.object_list)
This is how de default Paginator does it:
@cached_property def count(self): """ Returns the total number of objects, across all pages. """ try: return self.object_list.count() except (AttributeError, TypeError): # AttributeError if object_list has no count() method. # TypeError if object_list.count() requires arguments # (i.e. is of type list). return len(self.object_list)
comment:51 by , 6 years ago
.count()
is called on the unfiltered queryset so calling len()
on it would fetch all objects from the database, not just those on the current page.
comment:52 by , 6 years ago
I indeed spoke too early. This did work for me as I combine it with django-postgres-fuzzycount
as well which takes care of the unfiltered total count:
https://github.com/stephenmcd/django-postgres-fuzzycount/blob/master/fuzzycount.py
class FuzzyCountAdmin(ReplicaModelAdmin): """ Models must have: fuzzy_count_manager = FuzzyCountManager() """ def get_queryset(self, request): if hasattr(self.model, 'fuzzy_count_manager') and request.method == 'GET': qs = self.model.fuzzy_count_manager.get_queryset() else: qs = self.model._default_manager.get_queryset() # taken from Django source # TODO: this should be handled by some parameter to the ChangeList. ordering = self.get_ordering(request) if ordering: qs = qs.order_by(*ordering) return qs
This also works for the filtered requests:
class LenCountPaginator(Paginator): @cached_property def count(self): if self.object_list.query.where or getattr(self.object_list.query, 'having', None): return len(self.object_list) return self.object_list.count()
comment:53 by , 6 years ago
Again spoke to early, sorry for the spam, it somehow only works for one particular admin view I have. Trying to figure out why that is the case.
comment:54 by , 6 years ago
Ok, it only works when the number of total rows in the database is > list_per_page
. When they are less I see multiple calls to len(self.object_list)
. Which I don't mind as for those views performance is a non-issue.
Nonetheless, this is all highly unreliable as it relies on the queryset cache also being used for rendering the actual list of objects.
comment:55 by , 5 years ago
Is it possible to have a separate table that keeps the count of all the rows in other tables? I am new here and don't know how much work.
comment:56 by , 4 years ago
In addition to slow COUNT queries, I've just noticed that the listing page calls len()
on a potentially slow queryset:
On my listing page, I have a couple nested lookups, so this becomes a complex query with four joins, and it can take tens of seconds to run for around 600k records. With too many more records than that, I'm running afoul of AWS Lambda timeouts (30s, just like the Heroku user mentioned above).
All this to say - any solution to this slow COUNT issue should also address (avoid) this slow query as well.
comment:57 by , 4 years ago
Cc: | added |
---|
comment:58 by , 3 years ago
Cc: | added |
---|
comment:59 by , 3 years ago
I couldn't find any good solution so I build myself a custom paginator.
My database is mysql using innodb, I made use of innodb to get a fast and precise enough count of rows. Please be aware, that this might not be precise enough for everyone's usecase.
My paginator shouldn't be understood as a patch, but it's an adequate workaround to speed up your django admin. I had no problem with that, and I can now easily browse my 5 million+ datasets.
from django.core.paginator import Paginator from django.utils.functional import cached_property from django.db import connection class MysqlInnoDbPaginator(Paginator): @cached_property def count(self): with connection.cursor() as cursor: cursor.execute( "SELECT TABLE_ROWS FROM information_schema.tables WHERE TABLE_SCHEMA = %s AND TABLE_NAME = %s;", [connection.settings_dict['NAME'], self.object_list.model._meta.db_table] ) row = cursor.fetchone() if row is not None: return row[0] else: return 0
Add this to your admin.ModelAdmin implementation. The show_full_result_count is very important, otherwise another count would happen.
class FoobarAdmin(admin.ModelAdmin): paginator = MysqlInnoDbPaginator show_full_result_count = False
Also be aware, if you have multiple database configurations, you should modify it so it would find the right database name
comment:60 by , 5 weeks ago
Cc: | added |
---|
https://github.com/django/django/commit/17557d068c43bd61cdc6c18caf250ffa469414a1 says it's fixed this issue. Is there anything missing to close it?
comment:61 by , 5 weeks ago
From my understanding a COUNT
still happens for pagination purposes even if the show_full_result_count option is disabled.
I can't see that this is likely to happen. Firstly, it would be an admin option, not a meta option. Secondly, even with tens of thousands of rows in a test table here, the count takes far less than a second, so compared to the time for collecting the display data, rendering it and returning the page, it isn't that large a time slice.
How many rows are in this table you are seeing the problem with?
Right now I'm -1 on this as being such an edge case as to be not worth an option. If counting the rows in a table is too slow to be used, then there are going to be plenty of other options querying that table as well.