Opened 15 years ago

Closed 15 years ago

Last modified 13 years ago

#11287 closed Uncategorized (invalid)

Better performance of the Paginator list

Reported by: QingFeng Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Normal Keywords: Paginator
Cc: qingfeng@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

old:

objects = Model1.objects.all()#-->Read all the data, poor performance
p = Paginator(objects, 2)

NBPageList:

objs = NBPageList(Model1.objects)
p = Paginator(objs, 5)

Attachments (3)

10953.patch (636 bytes ) - added by QingFeng 15 years ago.
Paginator patch
tests.py (872 bytes ) - added by QingFeng 15 years ago.
TestCase
models.py (115 bytes ) - added by QingFeng 15 years ago.
models

Download all attachments as: .zip

Change History (9)

by QingFeng, 15 years ago

Attachment: 10953.patch added

Paginator patch

by QingFeng, 15 years ago

Attachment: tests.py added

TestCase

by QingFeng, 15 years ago

Attachment: models.py added

models

comment:1 by Alex Gaynor, 15 years ago

Resolution: invalid
Status: newclosed

I have no idea what this ticket is describing, but a) this won't be faster in any way, and b) Model.objects.all() does not pull in all records.

comment:2 by Karen Tracey, 15 years ago

(Alex was faster than me. But I'll post this anyway in case it helps guide future submissions.)

The comment here:

objects = Model1.objects.all()#-->Read all the data, poor performance

is incorrect. Query sets are lazy, see:

http://docs.djangoproject.com/en/dev/topics/db/queries/#id3

and:

http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated

Thus whatever is being proposed seems to be founded on an incorrect assumption.

For future reference, when proposing something it would be helpful to have more words describing the problem and solution. NBPageList means nothing to me. What is the NB supposed to signify? Also, patches should be complete in a single file, not posted as multiple individual diffs. And if you are proposing a new public interface/class (which seems to be where this was going), it needs to have some documentation.

in reply to:  2 comment:3 by QingFeng, 15 years ago

Replying to kmtracey:

is incorrect. Query sets are lazy, see:

http://docs.djangoproject.com/en/dev/topics/db/queries/#id3

and:

http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated

Thus whatever is being proposed seems to be founded on an incorrect assumption.

Thank you, I made a mistake:)

comment:4 by paulo@…, 13 years ago

Component: Core (Other)Database layer (models, ORM)
Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

But there is a real performance problem with the standard paginator.

For MVCC databases like postgres, performance is very poor for tables above a couple million lines when doing a count(*), because postgres doesn't stores enough information on indexes and will perform a seqscan. The default Paginator seems to call count(*), this can bring a standard server install to its knees.

My workaround os overriding _get_count to use something like:

SELECT reltuples FROM pg_class WHERE relname = 'myapp_mymodel';

May be the ORM should implement a "estimated_count" method for when you don't need very accurate numbers. I would volunteer to send a patch but I don't know how to perform a fast "estimated count" for all database engines supported.

comment:5 by Anssi Kääriäinen, 13 years ago

The downside of estimated tuples is that it can return wildly wrong results (the PostgreSQL one mentioned above at least).

The preferred way for me would be to have a couple of different new options for the paginator (pony requests if you wish):

max_pages:

  • This would be useful in limiting the maximum page to say 20, and if you have 25 objs per page, you would do "SELECT count(*) FROM (select 1 from real_query limit 500) tmp" instead of "SELECT count(*) FROM real_query". This should be more than fast enough assuming you have proper indexes in place.

Links-only:

  • This would give you only "next page" and "previous page" links. The trick is you can use a "SELECT objs FROM somepage WHERE id > last_page_id ORDER BY id LIMIT 25" to fetch the next page. This is lightning-fast if you have the proper indexes. This requires the ordering is done on an unique condition.

Using the estimated_count() method would still leave you open to people just going to page 10000, hit F5 a couple of times, and watch the DB server burn.

This ticket is closed and should remain so. The above ideas might be worth investigation. But not in this ticket.

comment:6 by trbs, 13 years ago

Just too add a little note. This estimation of total query count by using the tables statistics (reltuples in postgresql) won't work even if it would give back an a reasonable count all the time.

Since this will return the estimated amount of rows in a table not in query. It's very usual for a query backing the pager to have several WHERE clauses. eg: Blog.objects.filter(pub_date__lte=now, status="approved", site=site)

I really like the max pages idea, if page 20 still yields 25 results (or retrieve 26 so we know for sure there more then 25) we can just add another page number and go up one by one from there.

Note: See TracTickets for help on using tickets.
Back to Top