Opened 12 years ago
Closed 12 years ago
#20600 closed Bug (fixed)
Subqueries should retain ORDER BY when using DISTINCT ON
| Reported by: | brianglass | Owned by: | anonymous |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.5 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
When using subqueries in conjunction with DISTINCT ON, ordering should be retained.
Case in point: if I have a table of chapters in a book with weightings for popularity and I wish to obtain the top weighted chapter per book I might do something like:
top_chapters = Chapter.objects.filter(topics__in=['css', 'javascript']).distinct('book_id').order_by('book_id', '-weight', 'id)
Now I want the chapters to be ordered by weight, so I use top_chapters as a sub-query:
chapters = Chapter.objects.filter(id__in=top_chapters).order_by('weight')
However, the ORDER BY statement is stripped from the subquery so we are not guaranteed to get the top chapters. If the subquery is sliced as in the following, the ordering is retained:
count = top_chapters.count() chapters = Chapter.objects.filter(id__in=top_chapters[:count]).order_by('weight')
This issue seems related to ticket #12328. In that ticket, order by was intentionally retained if the subquery is sliced.
Change History (9)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
| Easy pickings: | set |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
The workaround might actually work if you add a .values() call in there.
But yes, this seems like a bug. I wonder if adding an ORDER BY into the subquery always is the correct approach. But maybe that is costly on some databases...
The easy solution is to just add ORDER BY if the query has distinct on fields set.
comment:3 by , 12 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
comment:5 by , 12 years ago
| Patch needs improvement: | set |
|---|
(closed) PR has tests, but lacks fixing implementation.
comment:6 by , 12 years ago
| Easy pickings: | unset |
|---|
comment:7 by , 12 years ago
I've got this issue as well. Posted workaround didn't work for me (even with .values()). The only solution I've found is to evaluate query for ids and then use result in "in" statement:
top_chapters_ids = Chapter.objects.filter(
topics__in=['css', 'javascript']
).distinct(
'book_id'
).order_by(
'book_id',
'-weight',
'id'
).values_list('id', flat=True)
# evaluate query
top_chapters_ids = list(top_chapters_ids)
chapters = Chapter.objects.filter(
id__in=top_chapters_ids
).order_by('weight')
It is working but resource-consuming solution. And possibly with large number of ids it wouldn't work for SQLite databases, because of their limitations.
comment:8 by , 12 years ago
| Has patch: | set |
|---|---|
| Patch needs improvement: | unset |
A patch fixing this is available from https://github.com/django/django/pull/1722 - I have commented all changes so that it should be easy to verify that the patch is actually doing the right thing. Anybody care to double-check the pull request?
comment:9 by , 12 years ago
| Resolution: | → fixed |
|---|---|
| Status: | assigned → closed |
Note that the workaround solution I posted above does product SQL with the ORDER BY retained, but it also contains too many columns for the
id__into work correctly and I get back: