Opened 6 years ago

Closed 6 years ago

Last modified 3 years ago

#11287 closed Uncategorized (invalid)

Better performance of the Paginator list

Reported by: qingfeng Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Normal Keywords: Paginator
Cc: qingfeng@… Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

old:

objects = Model1.objects.all()#-->Read all the data, poor performance
p = Paginator(objects, 2)

NBPageList:

objs = NBPageList(Model1.objects)
p = Paginator(objs, 5)

Attachments (3)

10953.patch (636 bytes) - added by qingfeng 6 years ago.
Paginator patch
tests.py (872 bytes) - added by qingfeng 6 years ago.
TestCase
models.py (115 bytes) - added by qingfeng 6 years ago.
models

Download all attachments as: .zip

Change History (9)

Changed 6 years ago by qingfeng

Paginator patch

Changed 6 years ago by qingfeng

TestCase

Changed 6 years ago by qingfeng

models

comment:1 Changed 6 years ago by Alex

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

I have no idea what this ticket is describing, but a) this won't be faster in any way, and b) Model.objects.all() does not pull in all records.

comment:2 follow-up: Changed 6 years ago by kmtracey

(Alex was faster than me. But I'll post this anyway in case it helps guide future submissions.)

The comment here:

objects = Model1.objects.all()#-->Read all the data, poor performance

is incorrect. Query sets are lazy, see:

http://docs.djangoproject.com/en/dev/topics/db/queries/#id3

and:

http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated

Thus whatever is being proposed seems to be founded on an incorrect assumption.

For future reference, when proposing something it would be helpful to have more words describing the problem and solution. NBPageList means nothing to me. What is the NB supposed to signify? Also, patches should be complete in a single file, not posted as multiple individual diffs. And if you are proposing a new public interface/class (which seems to be where this was going), it needs to have some documentation.

comment:3 in reply to: ↑ 2 Changed 6 years ago by qingfeng

Replying to kmtracey:

is incorrect. Query sets are lazy, see:

http://docs.djangoproject.com/en/dev/topics/db/queries/#id3

and:

http://docs.djangoproject.com/en/dev/ref/models/querysets/#when-querysets-are-evaluated

Thus whatever is being proposed seems to be founded on an incorrect assumption.

Thank you, I made a mistake:)

comment:4 Changed 3 years ago by paulo@…

  • Component changed from Core (Other) to Database layer (models, ORM)
  • Easy pickings unset
  • Severity set to Normal
  • Type set to Uncategorized
  • UI/UX unset

But there is a real performance problem with the standard paginator.

For MVCC databases like postgres, performance is very poor for tables above a couple million lines when doing a count(*), because postgres doesn't stores enough information on indexes and will perform a seqscan. The default Paginator seems to call count(*), this can bring a standard server install to its knees.

My workaround os overriding _get_count to use something like:

SELECT reltuples FROM pg_class WHERE relname = 'myapp_mymodel';

May be the ORM should implement a "estimated_count" method for when you don't need very accurate numbers. I would volunteer to send a patch but I don't know how to perform a fast "estimated count" for all database engines supported.

comment:5 Changed 3 years ago by akaariai

The downside of estimated tuples is that it can return wildly wrong results (the PostgreSQL one mentioned above at least).

The preferred way for me would be to have a couple of different new options for the paginator (pony requests if you wish):

max_pages:

  • This would be useful in limiting the maximum page to say 20, and if you have 25 objs per page, you would do "SELECT count(*) FROM (select 1 from real_query limit 500) tmp" instead of "SELECT count(*) FROM real_query". This should be more than fast enough assuming you have proper indexes in place.

Links-only:

  • This would give you only "next page" and "previous page" links. The trick is you can use a "SELECT objs FROM somepage WHERE id > last_page_id ORDER BY id LIMIT 25" to fetch the next page. This is lightning-fast if you have the proper indexes. This requires the ordering is done on an unique condition.

Using the estimated_count() method would still leave you open to people just going to page 10000, hit F5 a couple of times, and watch the DB server burn.

This ticket is closed and should remain so. The above ideas might be worth investigation. But not in this ticket.

comment:6 Changed 3 years ago by trbs

Just too add a little note. This estimation of total query count by using the tables statistics (reltuples in postgresql) won't work even if it would give back an a reasonable count all the time.

Since this will return the estimated amount of rows in a table not in query. It's very usual for a query backing the pager to have several WHERE clauses. eg: Blog.objects.filter(pub_date__lte=now, status="approved", site=site)

I really like the max pages idea, if page 20 still yields 25 results (or retrieve 26 so we know for sure there more then 25) we can just add another page number and go up one by one from there.

Note: See TracTickets for help on using tickets.
Back to Top