Opened 11 years ago

Closed 10 years ago

#20600 closed Bug (fixed)

Subqueries should retain ORDER BY when using DISTINCT ON

Reported by: brianglass Owned by: anonymous
Component: Database layer (models, ORM) Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using subqueries in conjunction with DISTINCT ON, ordering should be retained.

Case in point: if I have a table of chapters in a book with weightings for popularity and I wish to obtain the top weighted chapter per book I might do something like:

top_chapters = Chapter.objects.filter(topics__in=['css', 'javascript']).distinct('book_id').order_by('book_id', '-weight', 'id)

Now I want the chapters to be ordered by weight, so I use top_chapters as a sub-query:

chapters = Chapter.objects.filter(id__in=top_chapters).order_by('weight')

However, the ORDER BY statement is stripped from the subquery so we are not guaranteed to get the top chapters. If the subquery is sliced as in the following, the ordering is retained:

count = top_chapters.count()
chapters = Chapter.objects.filter(id__in=top_chapters[:count]).order_by('weight')

This issue seems related to ticket #12328. In that ticket, order by was intentionally retained if the subquery is sliced.

Change History (9)

comment:1 by brianglass, 11 years ago

Note that the workaround solution I posted above does product SQL with the ORDER BY retained, but it also contains too many columns for the id__in to work correctly and I get back:

*** DatabaseError: subquery has too many columns
LINE 1: ..." FROM "heron_section" WHERE "heron_section"."id" IN (SELECT...

comment:2 by Anssi Kääriäinen, 11 years ago

Easy pickings: set
Triage Stage: UnreviewedAccepted

The workaround might actually work if you add a .values() call in there.

But yes, this seems like a bug. I wonder if adding an ORDER BY into the subquery always is the correct approach. But maybe that is costly on some databases...

The easy solution is to just add ORDER BY if the query has distinct on fields set.

comment:3 by koddsson, 11 years ago

Owner: changed from nobody to anonymous
Status: newassigned

comment:5 by Marc Tamlyn, 11 years ago

Patch needs improvement: set

(closed) PR has tests, but lacks fixing implementation.

comment:6 by Tim Graham, 11 years ago

Easy pickings: unset

comment:7 by chekunkov, 10 years ago

I've got this issue as well. Posted workaround didn't work for me (even with .values()). The only solution I've found is to evaluate query for ids and then use result in "in" statement:

top_chapters_ids = Chapter.objects.filter(
    topics__in=['css', 'javascript']
).distinct(
    'book_id'
).order_by(
    'book_id', 
    '-weight', 
    'id'
).values_list('id', flat=True)

# evaluate query
top_chapters_ids = list(top_chapters_ids)

chapters = Chapter.objects.filter(
    id__in=top_chapters_ids
).order_by('weight')

It is working but resource-consuming solution. And possibly with large number of ids it wouldn't work for SQLite databases, because of their limitations.

comment:8 by Anssi Kääriäinen, 10 years ago

Has patch: set
Patch needs improvement: unset

A patch fixing this is available from https://github.com/django/django/pull/1722 - I have commented all changes so that it should be easy to verify that the patch is actually doing the right thing. Anybody care to double-check the pull request?

comment:9 by Anssi Kääriäinen <akaariai@…>, 10 years ago

Resolution: fixed
Status: assignedclosed

In b1b04df06522b092a9b4768d2d61c52956e00eca:

Fixed #20600 -- ordered distinct(*fields) in subqueries

Note: See TracTickets for help on using tickets.
Back to Top