Code

Opened 7 years ago

Closed 7 years ago

#5267 closed (fixed)

document that order_by('?') is a huge performance issue

Reported by: GomoX <gomo@…> Owned by: mboersma
Component: Database layer (models, ORM) Version:
Severity: Keywords:
Cc: gomo@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

order_by('?') generates an SQL query that is horrendous from a performance point of view (the "ORDER BY RAND() LIMIT" type query).

Info on this:
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/

For the current state of affairs, I think at the very least a warning should be added to http://www.djangoproject.com/documentation/db-api/#order-by-fields .
That page happily states that you can use the method for obtaining a random row, but in a real scenario that is a very bad idea, and should be avoided at all costs.

On a more useful approach, maybe extra code could be added to a model's Meta class if you plan on grabbing random rows from that particular table. This could set up needed tables/columns/constraints in order to extract a random row without such a big performance hit. If you use order_by('?') on a model with this Meta setting, the enhancement would be transparent. How and if this improvement could be implemented is open for discussion, and is probably database dependent. The page I linked above has some discussion on the topic.

Attachments (1)

Fix5267.diff (1.0 KB) - added by mboersma 7 years ago.
Added a warning sentence that order_by('?') may be expensive and slow

Download all attachments as: .zip

Change History (4)

comment:1 Changed 7 years ago by Simon G. <dev@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from order_by('?') is a huge performance issue to document that order_by('?') is a huge performance issue
  • Triage Stage changed from Unreviewed to Accepted

I think it's fairly common knowledge that ORDER BY RAND is horrifically inefficient, but it's probably a good idea to place a warning there. Want to write one up?

As for implementing a better random, I think the costs outweigh the benefits, especially if it does mean cracking into weird SQL dialects. This is something to raise on django-developers.

Changed 7 years ago by mboersma

Added a warning sentence that order_by('?') may be expensive and slow

comment:2 Changed 7 years ago by mboersma

  • Has patch set
  • Owner changed from nobody to mboersma
  • Status changed from new to assigned
  • Triage Stage changed from Accepted to Ready for checkin

comment:3 Changed 7 years ago by adrian

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [6293]) Fixed #5267 -- Documented that order_by('?') queries can be slow

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.