Opened 17 years ago
Closed 17 years ago
#5267 closed (fixed)
document that order_by('?') is a huge performance issue
Reported by: | Owned by: | Matt Boersma | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | |
Severity: | Keywords: | ||
Cc: | gomo@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
order_by('?') generates an SQL query that is horrendous from a performance point of view (the "ORDER BY RAND() LIMIT" type query).
Info on this:
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
For the current state of affairs, I think at the very least a warning should be added to http://www.djangoproject.com/documentation/db-api/#order-by-fields .
That page happily states that you can use the method for obtaining a random row, but in a real scenario that is a very bad idea, and should be avoided at all costs.
On a more useful approach, maybe extra code could be added to a model's Meta class if you plan on grabbing random rows from that particular table. This could set up needed tables/columns/constraints in order to extract a random row without such a big performance hit. If you use order_by('?') on a model with this Meta setting, the enhancement would be transparent. How and if this improvement could be implemented is open for discussion, and is probably database dependent. The page I linked above has some discussion on the topic.
Attachments (1)
Change History (4)
comment:1 by , 17 years ago
Summary: | order_by('?') is a huge performance issue → document that order_by('?') is a huge performance issue |
---|---|
Triage Stage: | Unreviewed → Accepted |
by , 17 years ago
Attachment: | Fix5267.diff added |
---|
Added a warning sentence that order_by('?') may be expensive and slow
comment:2 by , 17 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
Triage Stage: | Accepted → Ready for checkin |
comment:3 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
I think it's fairly common knowledge that ORDER BY RAND is horrifically inefficient, but it's probably a good idea to place a warning there. Want to write one up?
As for implementing a better random, I think the costs outweigh the benefits, especially if it does mean cracking into weird SQL dialects. This is something to raise on django-developers.