Opened 15 years ago

Last modified 13 months ago

#8408 new New feature

Add a new meta option: don't do count(*) in admin

Reported by: LI Daobing Owned by: Thomas C
Component: contrib.admin Version: dev
Severity: Normal Keywords:
Cc: Thomas Kerpe, Ludovico Magnocavallo, marcoberi@…, djfische@…, boxm@…, kmike84@…, slav0nic0@…, dan.fairs@…, markus.magnuson@…, t.chaumeny@…, josh.smeaton@…, Ionel Cristian Mărieș, Ondřej Chmelař, elonzh Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Hello,

everytime when I use Admin to view a table, the Admin count the rows in the table. if the database is mysql+innodb, this step is very slow, how about add an option in Meta to prevent this, such as

class A(models.Model):
    class Meta:
        do_not_count = True

thanks

Attachments (1)

paginator-patch.diff (13.2 KB) - added by Malcolm Box 12 years ago.
Patch admin classes to add UncountedPaginator

Download all attachments as: .zip

Change History (60)

comment:1 Changed 15 years ago by Malcolm Tredinnick

milestone: post-1.0
Triage Stage: UnreviewedDesign decision needed

I can't see that this is likely to happen. Firstly, it would be an admin option, not a meta option. Secondly, even with tens of thousands of rows in a test table here, the count takes far less than a second, so compared to the time for collecting the display data, rendering it and returning the page, it isn't that large a time slice.

How many rows are in this table you are seeing the problem with?

Right now I'm -1 on this as being such an edge case as to be not worth an option. If counting the rows in a table is too slow to be used, then there are going to be plenty of other options querying that table as well.

comment:2 Changed 15 years ago by LI Daobing

we have tens of millions rows in this table. and we want to use django Admin as a customizable viewer.

comment:3 Changed 15 years ago by Adrian Holovaty

Triage Stage: Design decision neededAccepted

I like this optional optimization for people who don't want the admin to do a "select count(*)" on every changelist view, but I'm not sure whether we could implement it without throwing out some valuable parts of the changelist (mostly, the pagination). So, I'm marking this as "accepted," with the caveat that we need to look at what parts of the changelist will be thrown out.

comment:4 Changed 15 years ago by soroosh

This can be done without losing any functionality by setting a limit for count like this:
SELECT count(*) from (select id from HUGE_TABLE LIMIT 10000) as b;
Then if the raws count is more than for example 10000, SQL server stops sequential scanning through the table and paginatior can list pages around current page but not the last page. On my system postgreql takes 60 seconds to return count of a table with 11,000,000 raws and this is suboptimal to wait a minute for each admin page to load.

comment:5 Changed 14 years ago by (none)

milestone: post-1.0

Milestone post-1.0 deleted

comment:6 Changed 14 years ago by mrts

milestone: 1.2

Having problems with large datasets in admin myself (not sure if count() is the only reason though), proposing this for 1.2.

comment:7 Changed 14 years ago by Gonzalo Saavedra

Cc: Gonzalo Saavedra added

comment:8 Changed 14 years ago by Thomas Kerpe

Cc: Thomas Kerpe added

comment:9 Changed 14 years ago by Ludovico Magnocavallo

Cc: Ludovico Magnocavallo added

comment:10 Changed 13 years ago by stephanhoyer

The count(*) is essential for displaying the paginator. Without knowing the amount of entities of a table, there can be any well working paginator.
One solution could be to display only "prev/next"-buttons and input field for page.

But i think the effort of fixing this is much higher then the benefit

So -1 for this request.

comment:11 Changed 13 years ago by anonymous

aside from the count problem, LIMIT item_count OFFSET 25000 (maybe slower numbers too already) is damn slow in MySQL ;)

comment:12 Changed 13 years ago by James Bennett

milestone: 1.2

1.2 is feature-frozen, moving this feature request off the milestone.

comment:13 Changed 13 years ago by marcob

See #13643 for an ugly-monkey-patching workaround.

comment:14 Changed 13 years ago by marcob

Cc: marcoberi@… added

comment:15 Changed 12 years ago by David Fischer

Cc: djfische@… added

comment:16 Changed 12 years ago by Mikhail Korobov

I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator

comment:17 in reply to:  16 ; Changed 12 years ago by Łukasz Rekucki

Resolution: wontfix
Status: newclosed

Replying to kmike:

I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator

Agreed :) Here's a link to a sample Paginator that doesn't use count

comment:18 in reply to:  17 Changed 12 years ago by Malcolm Box

Cc: boxm@… added
Easy pickings: unset
Resolution: wontfix
Severity: Normal
Status: closedreopened
Type: Uncategorized
Version: 0.96SVN

Replying to lrekucki:

Replying to kmike:

I think this can be marked as 'wontfix' because custom paginators can be used in admin now:
http://docs.djangoproject.com/en/dev/ref/contrib/admin/#django.contrib.admin.ModelAdmin.paginator

Agreed :) Here's a link to a sample Paginator that doesn't use count

Reopening, as the resolution simply doesn't work. The example paginator given by kmike simply doesn't work as a ModelAdmin paginator, because the {% pagination %} tag and the ChangeList class (among others) assume you can count the number of items being paginated. See my comment on #4065 for the gory details.

There are scenarios where you don't know how many results are being paginated, but can still do useful pagination links:

  • With InnoDB tables in MySQL, count(*) is frighteningly expensive, so while in principle you can count, in practice you can't
  • With NoSQL databases it can be impossible to do a count, but you can still paginate by range queries

A proposed design for such a paginator would be to display (on page 12):

1 2 3 ..... 10 11 12 ... Next

which is better than just next/prev.

I'll see if I can create acceptable patches.

Changed 12 years ago by Malcolm Box

Attachment: paginator-patch.diff added

Patch admin classes to add UncountedPaginator

comment:19 Changed 12 years ago by Malcolm Box

Has patch: set
Needs documentation: set
Needs tests: set
Triage Stage: AcceptedDesign decision needed

OK, here's an attempt at a patch that fixes this.

I've added a new UncountedPaginator to pagination.py, and updated the admin code to check whether the paginator has the appropriate attributes before using them, falling back on alternative implementations if not.

Before I create test & doc updates, would like some feedback on whether this patch is going about things the right way and is something that might be accepted.

comment:20 Changed 12 years ago by Julien Phalip

Type: UncategorizedNew feature
UI/UX: unset

comment:21 Changed 12 years ago by charles@…

This is not just MySQL. PostgreSQL has a separate query for cheap, inexact counts (with statistics collected during the most recent ANALYZE) -- but exact counts are expensive; I have a table where a SELECT COUNT(*) is typically running upward of 50 seconds.

Perhaps on PostgreSQL we could run a query like select reltuples from pg_class where relname=?;, and use the traditional pagination system if the result is under 20,000 or so, or a newer one otherwise?

See also http://wiki.postgresql.org/wiki/Slow_Counting

comment:22 Changed 12 years ago by Mikhail Korobov

Cc: kmike84@… added

comment:23 Changed 11 years ago by Adam Nelson

Triage Stage: Design decision neededAccepted

Switching back to 'accepted' - I think the switch to DDN was in error and simply sent this back into the icebox even though it was previously accepted by a core dev and has a draft patch.

comment:24 Changed 11 years ago by anonymous

The issue is not restricted to pagination. Seems like the admin interface calls count(*) at several other places.

comment:25 Changed 11 years ago by Gonzalo Saavedra

Cc: Gonzalo Saavedra removed

comment:26 Changed 11 years ago by nova77

Apparently count(*) is called on the whole table even if you filter your view. For instance if you add "?idin=555,666", you'll have two select count(*): one with the "where" clause and a plain one which runs on the whole table.

comment:27 Changed 11 years ago by Anssi Kääriäinen

Can you track down where the unrestricted qs.count() is issued, and if there is a reason for it?

comment:28 Changed 11 years ago by Aymeric Augustin

The admin displays the number of matching objects and the total number of objects.

comment:29 Changed 11 years ago by Sergey Maranchuk

Cc: slav0nic0@… added

comment:30 Changed 10 years ago by Dan Fairs

Cc: dan.fairs@… added

comment:31 Changed 10 years ago by bb@…

+1 because I ran into this problem today.

Using PostgreSQL and django 1.4 with a table that contains about 15 million rows. executing "select count(*) from mytable" in the psql console takes about 40 seconds, even with an index on the primary key. This leads to timeouts and huge lag.

I'm going to try MySQL to see if this is any better.

comment:32 Changed 10 years ago by Aymeric Augustin

Status: reopenednew

comment:33 Changed 10 years ago by craig.labenz@…

Using a table of 130M+ rows and thus obviously running into the same problem. The COUNT(*) query takes literally 10+ minutes to run in InnoDB-land.

comment:34 Changed 10 years ago by jonathan_livni

When django is hosted on Heroku, there's a time limit of 30 seconds for any web request. So instead of admin pages taking a very long time to load, you get admin pages which never load.

http://s2.postimg.org/onfy59xuh/Capture.jpg

comment:35 Changed 9 years ago by Tomek Paczkowski

Owner: changed from nobody to Tomek Paczkowski
Status: newassigned
Summary: add a new meta option: don't do count(*) in adminAdd a new meta option: don't do count(*) in admin

comment:36 Changed 9 years ago by Tomek Paczkowski

I'm working on this feature in my branch here: https://github.com/oinopion/django/compare/nocountpagination

comment:37 Changed 9 years ago by Markus Amalthea Magnuson

Cc: markus.magnuson@… added

comment:38 Changed 9 years ago by Chris Spencer

+1 for this feature. I have a few databases with tables containing tens of millions of records, and browsing them in Django is painfully slow, even with caching and extensive indexing. These sections of admin takes *minutes* to load, and DDT is telling me that around 90% of the load time is just from a single "SELECT COUNT(*) ..." to display the total result count, which I don't care about.

Yes, both PostgreSQL and MySQL have tricks for approximate counts, but only for total tables sizes, which doesn't help once you begin filtering.

And can confirm that using InfinitePaginator from django-pagination and specifying a custom paginator=InfinitePaginator in your modeladmin does not work because admin still expects to receive a valid count, which causes InfinitePaginator to throw NotImplementedError exceptions.

comment:39 Changed 9 years ago by Chris Spencer

See here for some interesting workarounds of Django's inefficient result count mechanism: http://craiglabenz.me/2013/06/12/how-i-made-django-admin-scale/

comment:40 Changed 9 years ago by Thomas C

Cc: t.chaumeny@… added

comment:41 Changed 9 years ago by Thomas C

Needs documentation: unset
Needs tests: unset

comment:42 Changed 9 years ago by Thomas C

Owner: changed from Tomek Paczkowski to Thomas C

comment:43 Changed 9 years ago by Aymeric Augustin

comment:44 Changed 9 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 17557d068c43bd61cdc6c18caf250ffa469414a1:

Fixed #8408 -- Added ModelAdmin.show_full_result_count to avoid COUNT() query.

Thanks lidaobing for the suggestion.

comment:45 Changed 6 years ago by Josh Smeaton

I don't think this ticket has been properly or fully fixed. The patch that landed only affects the display of filtered results. A count(*) is still executed every time you hit a page, which leads to timeouts when tables grow large.

I think the idea of using a custom paginator is probably a good one, but that could probably be controlled by an option in model admin or something. I'm not reopening this ticket until I've got some feedback first.

comment:46 Changed 6 years ago by Matthew Betts

Resolution: fixed
Status: closednew

As per Josh's comment a count(*) is still executed by the paginator when using the admin, even with show_full_result_count=False. This takes multiple seconds when querying tens of millions of rows. You can hack around the count(*) query by implementing a custom Paginator with either an appromixiate or a fixed count.

from django.core.paginator import Paginator
class FixedCountPaginator(Paginator):

    @property
    def count(self):
        return 1000

class MyModelAdmin(admin.ModelAdmin):
    show_full_result_count=False
    paginator = FixedCountPaginator

I'd like to fix this properly by either paginating without requesting the number of rows / pages or disabling pagination.

class MyModelAdmin(admin.ModelAdmin):
    show_full_result_count=False
    no_count = False

I've put up a pull request to demonstrate the changes required to achieve this. It changes the pagination widget in the admin to display [Prev, PAGE, Next]. Changing this to a show_pagination field that disables pagination would also solve the count(*) issue and may be more consistent / straight forward.

class MyModelAdmin(admin.ModelAdmin):
    show_full_result_count=False
    show_pagination = True

comment:47 Changed 6 years ago by Josh Smeaton

Cc: josh.smeaton@… added
Needs tests: set
Patch needs improvement: set

I spoke to Matthew at the pycon sprints and agreed that we should open this one back up if we had a PR to demonstrate. The PR needs some work still. I'll take this to the Mailing List.

comment:48 Changed 6 years ago by Ionel Cristian Mărieș

Cc: Ionel Cristian Mărieș added

comment:49 Changed 5 years ago by Jerome Leclanche

I've been using the following for a while:

class EstimatedCountPaginator(Paginator):
        def __init__(self, *args, **kwargs):
                super().__init__(*args, **kwargs)
                self.object_list.count = self.count

        @cached_property
        def count(self):
                if self.object_list.query.where:
                        return self.object_list.count()

                db_table = self.object_list.model._meta.db_table
                cursor = connections[self.object_list.db].cursor()
                cursor.execute("SELECT reltuples FROM pg_class WHERE relname = %s", (db_table, ))
                result = cursor.fetchone()
                if not result:
                        return 0
                return int(result[0])

At the very least I believe such a paginator should be available by default in django.contrib.postgres.

But going beyond that, the problem is that users are not always able to modify the affected admin objects. So there should be some way of setting the *default* paginator for all admin objects, I think.

comment:50 Changed 5 years ago by michi88

What's wrong with just always calling len(self.object_list)?
Using this saves me from the additional count of the filtered objects. This seems to work as the queryset is evaluated/fetched and cached on the len(...) call, after that another fetch is not needed to render the items in the admin list.

class LenCountPaginator(Paginator):

    @cached_property
    def count(self):
        return len(self.object_list)

This is how de default Paginator does it:

@cached_property
def count(self):
    """
    Returns the total number of objects, across all pages.
    """
    try:
        return self.object_list.count()
    except (AttributeError, TypeError):
        # AttributeError if object_list has no count() method.
        # TypeError if object_list.count() requires arguments
        # (i.e. is of type list).
        return len(self.object_list)

comment:51 Changed 5 years ago by Tim Graham

.count() is called on the unfiltered queryset so calling len() on it would fetch all objects from the database, not just those on the current page.

comment:52 Changed 5 years ago by michi88

I indeed spoke too early. This did work for me as I combine it with django-postgres-fuzzycount as well which takes care of the unfiltered total count:

https://github.com/stephenmcd/django-postgres-fuzzycount/blob/master/fuzzycount.py

class FuzzyCountAdmin(ReplicaModelAdmin):
    """ 
     Models must have:

     fuzzy_count_manager = FuzzyCountManager()
    """

    def get_queryset(self, request):
        if hasattr(self.model, 'fuzzy_count_manager') and request.method == 'GET':
            qs = self.model.fuzzy_count_manager.get_queryset()
        else:
            qs = self.model._default_manager.get_queryset()
        # taken from Django source
        # TODO: this should be handled by some parameter to the ChangeList.
        ordering = self.get_ordering(request)
        if ordering:
            qs = qs.order_by(*ordering)
        return qs

This also works for the filtered requests:

class LenCountPaginator(Paginator):

    @cached_property
    def count(self):
        if self.object_list.query.where or getattr(self.object_list.query, 'having', None):
            return len(self.object_list)
        return self.object_list.count()

comment:53 Changed 5 years ago by michi88

Again spoke to early, sorry for the spam, it somehow only works for one particular admin view I have. Trying to figure out why that is the case.

comment:54 Changed 5 years ago by michi88

Ok, it only works when the number of total rows in the database is > list_per_page. When they are less I see multiple calls to len(self.object_list). Which I don't mind as for those views performance is a non-issue.

Nonetheless, this is all highly unreliable as it relies on the queryset cache also being used for rendering the actual list of objects.

comment:55 Changed 3 years ago by TapanGujjar

Is it possible to have a separate table that keeps the count of all the rows in other tables? I am new here and don't know how much work.

comment:56 Changed 3 years ago by Brett Higgins

In addition to slow COUNT queries, I've just noticed that the listing page calls len() on a potentially slow queryset:

https://github.com/django/django/blob/9a3454f6046b9b7591fd03e21cf6da0b23c57689/django/contrib/admin/options.py#L1816

On my listing page, I have a couple nested lookups, so this becomes a complex query with four joins, and it can take tens of seconds to run for around 600k records. With too many more records than that, I'm running afoul of AWS Lambda timeouts (30s, just like the Heroku user mentioned above).

All this to say - any solution to this slow COUNT issue should also address (avoid) this slow query as well.

comment:57 Changed 2 years ago by Ondřej Chmelař

Cc: Ondřej Chmelař added

comment:58 Changed 19 months ago by elonzh

Cc: elonzh added

comment:59 in reply to:  description Changed 13 months ago by Misha

I couldn't find any good solution so I build myself a custom paginator.

My database is mysql using innodb, I made use of innodb to get a fast and precise enough count of rows. Please be aware, that this might not be precise enough for everyone's usecase.

My paginator shouldn't be understood as a patch, but it's an adequate workaround to speed up your django admin. I had no problem with that, and I can now easily browse my 5 million+ datasets.

from django.core.paginator import Paginator

from django.utils.functional import cached_property
from django.db import connection


class MysqlInnoDbPaginator(Paginator):
    @cached_property
    def count(self):
        with connection.cursor() as cursor:
            cursor.execute(
                "SELECT TABLE_ROWS FROM information_schema.tables WHERE TABLE_SCHEMA = %s AND TABLE_NAME = %s;",
                [connection.settings_dict['NAME'], self.object_list.model._meta.db_table]
            )

            row = cursor.fetchone()

            if row is not None:
                return row[0]
            else:
                return 0

Add this to your admin.ModelAdmin implementation. The show_full_result_count is very important, otherwise another count would happen.

class FoobarAdmin(admin.ModelAdmin):
    paginator = MysqlInnoDbPaginator
    show_full_result_count = False

Also be aware, if you have multiple database configurations, you should modify it so it would find the right database name

Note: See TracTickets for help on using tickets.
Back to Top