Opened 7 years ago

Closed 3 years ago

#7760 closed Cleanup/optimization (wontfix)

Queries on large primary tables with limit/offset clauses are slow

Reported by: henrybaxter Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: database, admin, slow, query
Cc: Triage Stage: Someday/Maybe
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When django's db layer needs to select_related() on a large initial table with a number of joins and it has a limit or offset clause, it generates slow queries.

SELECT * FROM big_table
INNER JOIN t1 ON big_table.t1_id
...
INNER JOIN tn ON big_table.tn_id
LIMIT 10
OFFSET 20

This query will join the whole table (even if it has a 1 million records) and THEN do the limit. This is WAY faster:

SELECT * FROM (SELECT * FROM big_table LIMIT 10 OFFSET 20)
INNER JOINs...

This type of query change does not work when there is a WHERE clause that references a product of the JOIN though. If the WHERE clause only references the primary table, then it still works great.

I have submitted a patch to help demonstrate the problem - it only works when there are no WHERE clauses at all (which is almost pointless), but it works for my particular purpose (pagination in the admin interface with no filters or searching)

I would love to hear from another developer how to integrate this functionality (or if it can actually be done without changes to the django codebase)

Attachments (1)

subquery_patch (2.9 KB) - added by henrybaxter 7 years ago.
Subquery LIMIT and OFFSET

Download all attachments as: .zip

Change History (9)

Changed 7 years ago by henrybaxter

Subquery LIMIT and OFFSET

comment:1 Changed 7 years ago by mtredinnick

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

At the moment, I suspect this is just not worth it, since there are so many cases where it's not going to be a useful transform and it's a whole separate code path to have to maintain. If it's a real performance issue for a particular use-case, then the correct solution is "don't do that". Doing limits and offsets is already a performance hit. Doing a slightly better related filter and/or using values() is going to be the solution in most cases.

Leaving open for a bit because I don't want to make a rush judgement, but I'm uninspired about the proposal at the moment.

comment:2 Changed 7 years ago by henrybaxter

Thank you for looking at it, I really appreciate it. I actually think I agree though, it's just not that useful. It requires careful manipulation for ORDER BY as well as WHERE as I mentioned.

Sorry for the ill-conceived idea :)

By the way the 'use case' was the Django admin. I still think somehow Django admin should handle this situation better:

class Thing(models.Model):

me = models.CharField(max_length=100)
foreign = models.ForeignKey(Foreign)
foreign2 = models.ForeignKey(Foreign2)

class Admin:

list_display = ('me', 'foreign', 'foreign2')

where 'Thing' has > 10000 entries, foreign tables need not be too large and you are looking at the admin changelist for it

On a decent machine with no load the resulting query will take several seconds to process

In face I have no idea why MySQL does not optimize the query, since this is really the database's territory...

Thanks again for your response!

comment:3 Changed 7 years ago by Piotr Lewandowski <django@…>

  • Component changed from Uncategorized to Database wrapper

comment:4 Changed 4 years ago by mtredinnick

  • Triage Stage changed from Design decision needed to Someday/Maybe

My opinion hasn't changed much over the intervening years on this one, although if it was magically easily possible, I would take the patch. Let's call it someday/maybe to give people something to look it. Should somebody come along with a great patch to do this that didn't make the normal SQL construction path slower, there would be no reason to keep it out.

comment:5 Changed 4 years ago by lukeplant

  • Severity set to Normal
  • Type set to Cleanup/optimization

comment:6 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:7 Changed 3 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:8 Changed 3 years ago by akaariai

  • Resolution set to wontfix
  • Status changed from new to closed

I think this can be closed. First, to me it seems a good DB would be able to optimize such queries, second, I am afraid supporting this will be hard. I wonder if fixing this ticket would cause performance problems for other queries.

If we want to support this kind of query, then I think a separate API for doing subqueries would be the way to go.

Note: See TracTickets for help on using tickets.
Back to Top