QuerySet: Distinct + Values + order by doesn't behave as one would expect
best shown by example .. effects the Queryset Refactor
## a simple model
class Thing(models.Model):
is_ok = models.BooleanField()
name = models.CharField()
## This works just fine
q = Thing.objects.filter(is_ok = True).values('name').distinct()
##
## SQL
##
## SELECT DISTINCT name from thing where is_ok = 1
##
## this fails
#
q = Thing.objects.filter(is_ok = True).extra(select={'short_name' : "SUBSTRING(name, 1, 3) "}.values('short_name').distinct()
#
# it does return a dict of just short_name
#
# q = [{ 'short_name': 'bla'}, { 'short_name': 'bla2'}, ...]
#
# however the SQL generated is
#
#
# SELECT DISTINCT name, (SUBSTRING(name, 1, 3)) as short_name from thing where is_ok = 1
#
# Note that 'Name' is still included in the query thus 'DISTINCT'
# is NOT distinct on 'short_name' but the name - short_name combo
#
# it should be
#
# SELECT DISTINCT (SUBSTRING(name, 1, 3)) as short_name from thing where is_ok = 1
#
Change History
(5)
Component: |
Uncategorized → Database wrapper
|
Keywords: |
order_by values added; Distinct + Extra + Values = not so removed
|
Summary: |
QuerySet: Distinct + Extra + Values = Not distinct on values if values contains an 'extra select' field → QuerySet: Distinct + Values + order by doesn't behave as one would expect
|
Triage Stage: |
Unreviewed → Design decision needed
|
Keywords: |
qsrf-cleanup added
|
Resolution: |
→ invalid
|
Status: |
new → closed
|
The "problem" is introduced in [7455], it is due to PostgreSQL's inability to order by on a column not present in a distinct query. I am flagging this Design Decision needed and will bring it up on django-dev.
The hotfix is
.order_by()
- turn of ordering for the given query, then it will work, it has nothing to do withextra
.The problem is a bit cryptic, because the fields needed for ordering are added to SELECT so the value is counted upon when doing the distinct in the DB, but is not returned to the user so he is puzzled for quite some time why distinct doesn't return distinct values.