Code

Opened 10 months ago

Closed 5 months ago

#20600 closed Bug (fixed)

Subqueries should retain ORDER BY when using DISTINCT ON

Reported by: brianglass Owned by: anonymous
Component: Database layer (models, ORM) Version: 1.5
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using subqueries in conjunction with DISTINCT ON, ordering should be retained.

Case in point: if I have a table of chapters in a book with weightings for popularity and I wish to obtain the top weighted chapter per book I might do something like:

top_chapters = Chapter.objects.filter(topics__in=['css', 'javascript']).distinct('book_id').order_by('book_id', '-weight', 'id)

Now I want the chapters to be ordered by weight, so I use top_chapters as a sub-query:

chapters = Chapter.objects.filter(id__in=top_chapters).order_by('weight')

However, the ORDER BY statement is stripped from the subquery so we are not guaranteed to get the top chapters. If the subquery is sliced as in the following, the ordering is retained:

count = top_chapters.count()
chapters = Chapter.objects.filter(id__in=top_chapters[:count]).order_by('weight')

This issue seems related to ticket #12328. In that ticket, order by was intentionally retained if the subquery is sliced.

Attachments (0)

Change History (9)

comment:1 Changed 10 months ago by brianglass

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Note that the workaround solution I posted above does product SQL with the ORDER BY retained, but it also contains too many columns for the id__in to work correctly and I get back:

*** DatabaseError: subquery has too many columns
LINE 1: ..." FROM "heron_section" WHERE "heron_section"."id" IN (SELECT...

comment:2 Changed 10 months ago by akaariai

  • Easy pickings set
  • Triage Stage changed from Unreviewed to Accepted

The workaround might actually work if you add a .values() call in there.

But yes, this seems like a bug. I wonder if adding an ORDER BY into the subquery always is the correct approach. But maybe that is costly on some databases...

The easy solution is to just add ORDER BY if the query has distinct on fields set.

comment:3 Changed 10 months ago by koddsson

  • Owner changed from nobody to anonymous
  • Status changed from new to assigned

comment:4 Changed 10 months ago by koddsson@…

comment:5 Changed 8 months ago by mjtamlyn

  • Patch needs improvement set

(closed) PR has tests, but lacks fixing implementation.

comment:6 Changed 7 months ago by timo

  • Easy pickings unset

comment:7 Changed 6 months ago by chekunkov

I've got this issue as well. Posted workaround didn't work for me (even with .values()). The only solution I've found is to evaluate query for ids and then use result in "in" statement:

top_chapters_ids = Chapter.objects.filter(
    topics__in=['css', 'javascript']
).distinct(
    'book_id'
).order_by(
    'book_id', 
    '-weight', 
    'id'
).values_list('id', flat=True)

# evaluate query
top_chapters_ids = list(top_chapters_ids)

chapters = Chapter.objects.filter(
    id__in=top_chapters_ids
).order_by('weight')

It is working but resource-consuming solution. And possibly with large number of ids it wouldn't work for SQLite databases, because of their limitations.

comment:8 Changed 6 months ago by akaariai

  • Has patch set
  • Patch needs improvement unset

A patch fixing this is available from https://github.com/django/django/pull/1722 - I have commented all changes so that it should be easy to verify that the patch is actually doing the right thing. Anybody care to double-check the pull request?

comment:9 Changed 5 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In b1b04df06522b092a9b4768d2d61c52956e00eca:

Fixed #20600 -- ordered distinct(*fields) in subqueries

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.