Opened 16 years ago
Closed 12 years ago
#7760 closed Cleanup/optimization (wontfix)
Queries on large primary tables with limit/offset clauses are slow
Reported by: | henrybaxter | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | database, admin, slow, query |
Cc: | Triage Stage: | Someday/Maybe | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When django's db layer needs to select_related() on a large initial table with a number of joins and it has a limit or offset clause, it generates slow queries.
SELECT * FROM big_table
INNER JOIN t1 ON big_table.t1_id
...
INNER JOIN tn ON big_table.tn_id
LIMIT 10
OFFSET 20
This query will join the whole table (even if it has a 1 million records) and THEN do the limit. This is WAY faster:
SELECT * FROM (SELECT * FROM big_table LIMIT 10 OFFSET 20)
INNER JOINs...
This type of query change does not work when there is a WHERE clause that references a product of the JOIN though. If the WHERE clause only references the primary table, then it still works great.
I have submitted a patch to help demonstrate the problem - it only works when there are no WHERE clauses at all (which is almost pointless), but it works for my particular purpose (pagination in the admin interface with no filters or searching)
I would love to hear from another developer how to integrate this functionality (or if it can actually be done without changes to the django codebase)
Attachments (1)
Change History (9)
by , 16 years ago
Attachment: | subquery_patch added |
---|
comment:1 by , 16 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
At the moment, I suspect this is just not worth it, since there are so many cases where it's not going to be a useful transform and it's a whole separate code path to have to maintain. If it's a real performance issue for a particular use-case, then the correct solution is "don't do that". Doing limits and offsets is already a performance hit. Doing a slightly better related filter and/or using values()
is going to be the solution in most cases.
Leaving open for a bit because I don't want to make a rush judgement, but I'm uninspired about the proposal at the moment.
comment:2 by , 16 years ago
Thank you for looking at it, I really appreciate it. I actually think I agree though, it's just not that useful. It requires careful manipulation for ORDER BY as well as WHERE as I mentioned.
Sorry for the ill-conceived idea :)
By the way the 'use case' was the Django admin. I still think somehow Django admin should handle this situation better:
class Thing(models.Model):
me = models.CharField(max_length=100)
foreign = models.ForeignKey(Foreign)
foreign2 = models.ForeignKey(Foreign2)
class Admin:
list_display = ('me', 'foreign', 'foreign2')
where 'Thing' has > 10000 entries, foreign tables need not be too large and you are looking at the admin changelist for it
On a decent machine with no load the resulting query will take several seconds to process
In face I have no idea why MySQL does not optimize the query, since this is really the database's territory...
Thanks again for your response!
comment:3 by , 16 years ago
Component: | Uncategorized → Database wrapper |
---|
comment:4 by , 14 years ago
Triage Stage: | Design decision needed → Someday/Maybe |
---|
My opinion hasn't changed much over the intervening years on this one, although if it was magically easily possible, I would take the patch. Let's call it someday/maybe to give people something to look it. Should somebody come along with a great patch to do this that didn't make the normal SQL construction path slower, there would be no reason to keep it out.
comment:5 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Cleanup/optimization |
comment:8 by , 12 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I think this can be closed. First, to me it seems a good DB would be able to optimize such queries, second, I am afraid supporting this will be hard. I wonder if fixing this ticket would cause performance problems for other queries.
If we want to support this kind of query, then I think a separate API for doing subqueries would be the way to go.
Subquery LIMIT and OFFSET