Opened 16 years ago

Closed 12 years ago

#7760 closed Cleanup/optimization (wontfix)

Queries on large primary tables with limit/offset clauses are slow

Reported by: henrybaxter Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: database, admin, slow, query
Cc: Triage Stage: Someday/Maybe
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When django's db layer needs to select_related() on a large initial table with a number of joins and it has a limit or offset clause, it generates slow queries.

SELECT * FROM big_table
INNER JOIN t1 ON big_table.t1_id
...
INNER JOIN tn ON big_table.tn_id
LIMIT 10
OFFSET 20

This query will join the whole table (even if it has a 1 million records) and THEN do the limit. This is WAY faster:

SELECT * FROM (SELECT * FROM big_table LIMIT 10 OFFSET 20)
INNER JOINs...

This type of query change does not work when there is a WHERE clause that references a product of the JOIN though. If the WHERE clause only references the primary table, then it still works great.

I have submitted a patch to help demonstrate the problem - it only works when there are no WHERE clauses at all (which is almost pointless), but it works for my particular purpose (pagination in the admin interface with no filters or searching)

I would love to hear from another developer how to integrate this functionality (or if it can actually be done without changes to the django codebase)

Attachments (1)

subquery_patch (2.9 KB ) - added by henrybaxter 16 years ago.
Subquery LIMIT and OFFSET

Download all attachments as: .zip

Change History (9)

by henrybaxter, 16 years ago

Attachment: subquery_patch added

Subquery LIMIT and OFFSET

comment:1 by Malcolm Tredinnick, 16 years ago

Triage Stage: UnreviewedDesign decision needed

At the moment, I suspect this is just not worth it, since there are so many cases where it's not going to be a useful transform and it's a whole separate code path to have to maintain. If it's a real performance issue for a particular use-case, then the correct solution is "don't do that". Doing limits and offsets is already a performance hit. Doing a slightly better related filter and/or using values() is going to be the solution in most cases.

Leaving open for a bit because I don't want to make a rush judgement, but I'm uninspired about the proposal at the moment.

comment:2 by henrybaxter, 16 years ago

Thank you for looking at it, I really appreciate it. I actually think I agree though, it's just not that useful. It requires careful manipulation for ORDER BY as well as WHERE as I mentioned.

Sorry for the ill-conceived idea :)

By the way the 'use case' was the Django admin. I still think somehow Django admin should handle this situation better:

class Thing(models.Model):

me = models.CharField(max_length=100)
foreign = models.ForeignKey(Foreign)
foreign2 = models.ForeignKey(Foreign2)

class Admin:

list_display = ('me', 'foreign', 'foreign2')

where 'Thing' has > 10000 entries, foreign tables need not be too large and you are looking at the admin changelist for it

On a decent machine with no load the resulting query will take several seconds to process

In face I have no idea why MySQL does not optimize the query, since this is really the database's territory...

Thanks again for your response!

comment:3 by Piotr Lewandowski <django@…>, 16 years ago

Component: UncategorizedDatabase wrapper

comment:4 by Malcolm Tredinnick, 14 years ago

Triage Stage: Design decision neededSomeday/Maybe

My opinion hasn't changed much over the intervening years on this one, although if it was magically easily possible, I would take the patch. Let's call it someday/maybe to give people something to look it. Should somebody come along with a great patch to do this that didn't make the normal SQL construction path slower, there would be no reason to keep it out.

comment:5 by Luke Plant, 14 years ago

Severity: Normal
Type: Cleanup/optimization

comment:6 by Aymeric Augustin, 13 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:7 by Aymeric Augustin, 13 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:8 by Anssi Kääriäinen, 12 years ago

Resolution: wontfix
Status: newclosed

I think this can be closed. First, to me it seems a good DB would be able to optimize such queries, second, I am afraid supporting this will be hard. I wonder if fixing this ticket would cause performance problems for other queries.

If we want to support this kind of query, then I think a separate API for doing subqueries would be the way to go.

Note: See TracTickets for help on using tickets.
Back to Top