Opened 11 years ago
Closed 11 years ago
#20600 closed Bug (fixed)
Subqueries should retain ORDER BY when using DISTINCT ON
Reported by: | brianglass | Owned by: | anonymous |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.5 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When using subqueries in conjunction with DISTINCT ON, ordering should be retained.
Case in point: if I have a table of chapters in a book with weightings for popularity and I wish to obtain the top weighted chapter per book I might do something like:
top_chapters = Chapter.objects.filter(topics__in=['css', 'javascript']).distinct('book_id').order_by('book_id', '-weight', 'id)
Now I want the chapters to be ordered by weight, so I use top_chapters as a sub-query:
chapters = Chapter.objects.filter(id__in=top_chapters).order_by('weight')
However, the ORDER BY statement is stripped from the subquery so we are not guaranteed to get the top chapters. If the subquery is sliced as in the following, the ordering is retained:
count = top_chapters.count() chapters = Chapter.objects.filter(id__in=top_chapters[:count]).order_by('weight')
This issue seems related to ticket #12328. In that ticket, order by was intentionally retained if the subquery is sliced.
Change History (9)
comment:1 by , 11 years ago
comment:2 by , 11 years ago
Easy pickings: | set |
---|---|
Triage Stage: | Unreviewed → Accepted |
The workaround might actually work if you add a .values() call in there.
But yes, this seems like a bug. I wonder if adding an ORDER BY into the subquery always is the correct approach. But maybe that is costly on some databases...
The easy solution is to just add ORDER BY if the query has distinct on fields set.
comment:3 by , 11 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:5 by , 11 years ago
Patch needs improvement: | set |
---|
(closed) PR has tests, but lacks fixing implementation.
comment:6 by , 11 years ago
Easy pickings: | unset |
---|
comment:7 by , 11 years ago
I've got this issue as well. Posted workaround didn't work for me (even with .values()). The only solution I've found is to evaluate query for ids and then use result in "in" statement:
top_chapters_ids = Chapter.objects.filter( topics__in=['css', 'javascript'] ).distinct( 'book_id' ).order_by( 'book_id', '-weight', 'id' ).values_list('id', flat=True) # evaluate query top_chapters_ids = list(top_chapters_ids) chapters = Chapter.objects.filter( id__in=top_chapters_ids ).order_by('weight')
It is working but resource-consuming solution. And possibly with large number of ids it wouldn't work for SQLite databases, because of their limitations.
comment:8 by , 11 years ago
Has patch: | set |
---|---|
Patch needs improvement: | unset |
A patch fixing this is available from https://github.com/django/django/pull/1722 - I have commented all changes so that it should be easy to verify that the patch is actually doing the right thing. Anybody care to double-check the pull request?
comment:9 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Note that the workaround solution I posted above does product SQL with the ORDER BY retained, but it also contains too many columns for the
id__in
to work correctly and I get back: