Code

Opened 6 years ago

Closed 6 years ago

Last modified 3 years ago

#7142 closed (invalid)

QuerySet: Distinct + Values + order by doesn't behave as one would expect

Reported by: bo Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Keywords: qsrf-cleanup distinct order_by values
Cc: Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

best shown by example .. effects the Queryset Refactor

## a simple model
class Thing(models.Model):
   is_ok = models.BooleanField()
   name = models.CharField()


## This works just fine
q = Thing.objects.filter(is_ok = True).values('name').distinct()
##
##  SQL
##
##  SELECT DISTINCT name from thing where is_ok = 1
##

## this fails
#

q = Thing.objects.filter(is_ok = True).extra(select={'short_name' : "SUBSTRING(name, 1, 3) "}.values('short_name').distinct()

#
# it does return a dict of just short_name
# 
#  q  = [{ 'short_name': 'bla'}, { 'short_name': 'bla2'}, ...]
#
#  however the SQL generated is
#
#
# SELECT DISTINCT name, (SUBSTRING(name, 1, 3)) as short_name from thing where is_ok = 1
#
#  Note that 'Name' is still included in the query thus 'DISTINCT' 
#  is NOT distinct on 'short_name' but the name - short_name combo
#
# it should be
#
# SELECT DISTINCT (SUBSTRING(name, 1, 3)) as short_name from thing where is_ok = 1
#

Attachments (0)

Change History (5)

comment:1 Changed 6 years ago by Honza_Kral

  • Component changed from Uncategorized to Database wrapper
  • Keywords order_by values added; Distinct + Extra + Values = not so removed
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from QuerySet: Distinct + Extra + Values = Not distinct on values if values contains an 'extra select' field to QuerySet: Distinct + Values + order by doesn't behave as one would expect
  • Triage Stage changed from Unreviewed to Design decision needed

The "problem" is introduced in [7455], it is due to PostgreSQL's inability to order by on a column not present in a distinct query. I am flagging this Design Decision needed and will bring it up on django-dev.

The hotfix is .order_by() - turn of ordering for the given query, then it will work, it has nothing to do with extra .

The problem is a bit cryptic, because the fields needed for ordering are added to SELECT so the value is counted upon when doing the distinct in the DB, but is not returned to the user so he is puzzled for quite some time why distinct doesn't return distinct values.

comment:2 Changed 6 years ago by gav

  • Keywords qsrf-cleanup added

comment:3 Changed 6 years ago by jacob

  • milestone set to 1.0

comment:4 Changed 6 years ago by mtredinnick

  • Resolution set to invalid
  • Status changed from new to closed

I gather there's some ordering on the model that's not mentioned here (or in the queryset construction that isn't mentioned)? Because I see no ordering in the examples above.

If that is the case (that there is ordering involved), then it's neither unexpected nor incorrect behaviour. It's correct SQL. It's also documented (see the note in that section of the docs).

If there's no ordering involved, then please reopen (although I can't repeat the problem), but as far as I can work out at the moment, this isn't a bug in Django.

comment:5 Changed 3 years ago by jacob

  • milestone 1.0 deleted

Milestone 1.0 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.