Django

Code

Ticket #5267 (closed: fixed)

Opened 1 year ago

Last modified 1 year ago

document that order_by('?') is a huge performance issue

Reported by: GomoX <gomo@datafull.com> Assigned to: mboersma
Milestone: Component: Database wrapper
Version: Keywords:
Cc: gomo@datafull.com Triage Stage: Ready for checkin
Has patch: 1 Needs documentation: 0
Needs tests: 0 Patch needs improvement: 0

Description

order_by('?') generates an SQL query that is horrendous from a performance point of view (the "ORDER BY RAND() LIMIT" type query).

Info on this: http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/

For the current state of affairs, I think at the very least a warning should be added to http://www.djangoproject.com/documentation/db-api/#order-by-fields . That page happily states that you can use the method for obtaining a random row, but in a real scenario that is a very bad idea, and should be avoided at all costs.

On a more useful approach, maybe extra code could be added to a model's Meta class if you plan on grabbing random rows from that particular table. This could set up needed tables/columns/constraints in order to extract a random row without such a big performance hit. If you use order_by('?') on a model with this Meta setting, the enhancement would be transparent. How and if this improvement could be implemented is open for discussion, and is probably database dependent. The page I linked above has some discussion on the topic.

Attachments

Fix5267.diff (1.0 kB) - added by mboersma on 09/15/07 11:38:24.
Added a warning sentence that order_by('?') may be expensive and slow

Change History

08/29/07 07:56:55 changed by Simon G. <dev@simon.net.nz>

  • needs_better_patch changed.
  • stage changed from Unreviewed to Accepted.
  • summary changed from order_by('?') is a huge performance issue to document that order_by('?') is a huge performance issue.
  • needs_tests changed.
  • needs_docs changed.

I think it's fairly common knowledge that ORDER BY RAND is horrifically inefficient, but it's probably a good idea to place a warning there. Want to write one up?

As for implementing a better random, I think the costs outweigh the benefits, especially if it does mean cracking into weird SQL dialects. This is something to raise on django-developers.

09/15/07 11:38:24 changed by mboersma

  • attachment Fix5267.diff added.

Added a warning sentence that order_by('?') may be expensive and slow

09/15/07 11:40:19 changed by mboersma

  • owner changed from nobody to mboersma.
  • status changed from new to assigned.
  • has_patch set to 1.
  • stage changed from Accepted to Ready for checkin.

09/15/07 12:30:23 changed by adrian

  • status changed from assigned to closed.
  • resolution set to fixed.

(In [6293]) Fixed #5267 -- Documented that order_by('?') queries can be slow


Add/Change #5267 (document that order_by('?') is a huge performance issue)




Change Properties
Action