Opened 9 years ago

Last modified 9 years ago

#25095 closed Bug

Fields annotated for filtering which are not included in values are included in GROUP BY clause in SQL — at Version 3

Reported by: mitchelljkotler Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by mitchelljkotler)

I am running a complex query, which annotates certain fields solely to filter on them. They are not included in a call to values. Another annotation is included, but it is incorrect due to the previous annotations being included in the GROUP BY clause.

Here is a not so simple example:

# We are filtering on the journey date, using the datediff function
# We annotate the reported tds fields, since we need two aliases of them for the different filterings applied

print EpTag.objects.filter(statuschanges__status=attached)\
    .annotate(attach_date=F('statuschanges__reported_tds'))\
    .filter(statuschanges__status=detached)\
    .annotate(detach_date=F('statuschanges__reported_tds'))\
    .annotate(journey_days=Func(F('detach_date'), F('attach_date'), function='DATEDIFF', output_field=IntegerField()))\
    .filter(journey_days__lt=10)\
    .annotate(name=F('object__producer__name'))\
    .values('object__producer', 'name')\
    .annotate(count=Count('pk', distinct=True))\
    .order_by().query

SELECT `ep_object`.`cid`, `ep_entity`.`name` AS `name`, COUNT(nnn `ep_tag`.`tid`) AS `count`
FROM `ep_tag`
INNER JOIN `ep_statuschange` ON ( `ep_tag`.`tid` = `ep_statuschange`.`tid` )
INNER JOIN `ep_statuschange` T4 ON ( `ep_tag`.`tid` = T4.`tid` )
LEFT OUTER JOIN `ep_object` ON ( `ep_tag`.`oid` = `ep_object`.`oid` )
LEFT OUTER JOIN `ep_entity` ON ( `ep_object`.`cid` = `ep_entity`.`eid` )
WHERE (`ep_statuschange`.`statusid` = 201 AND T4.`statusid` = 1000 AND DATEDIFF(T4.`reported_tds`, `ep_statuschange`.`reported_tds`) < 10)
# Group by still includes reported_tds and DATEDIFF fields, even though they were not selected.  This breaks my count
GROUP BY `ep_object`.`cid`, `ep_statuschange`.`reported_tds`, T4.`reported_tds`, DATEDIFF(T4.`reported_tds`, `ep_statuschange`.`reported_tds`), `ep_entity`.`name` ORDER BY NULL

In django/db/models/sql/query.py, line 1751 - all annotations are included in the group by. I believe this should be using annotation_select instead of annotations.

Change History (3)

comment:1 by Josh Smeaton, 9 years ago

Triage Stage: UnreviewedAccepted
Version: 1.8master

I've replicated the problem:

class FTest(models.Model):
    talk = models.IntegerField()
    hold = models.IntegerField()

In [8]: from scratch.models import *; from django.db.models import F, Count

In [9]: print(FTest.objects.annotate(talking=F('talk')).filter(talking__lte=30).values('hold').annotate(thecount=Count('pk')).query)
SELECT "scratch_ftest"."hold", COUNT("scratch_ftest"."id") AS "thecount" FROM "scratch_ftest" WHERE "scratch_ftest"."talk" <= 30 GROUP BY "scratch_ftest"."hold", "scratch_ftest"."talk"

It seems like your proposed fix works for your case, but my test VM is broken at the moment so I can't run the full test suite.

$ git diff
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index df65405..2f2375d 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -1674,8 +1674,8 @@ class Query(object):
         for col in self.select:
             self.group_by.append(col)

-        if self._annotations:
-            for alias, annotation in six.iteritems(self.annotations):
+        if self.annotation_select:
+            for alias, annotation in six.iteritems(self.annotation_select):
                 for col in annotation.get_group_by_cols():
                     self.group_by.append(col)

With the patch applied:

In [2]: print(FTest.objects.annotate(talking=F('talk')).filter(talking__lte=30).values('hold').annotate(thecount=Count('pk')).query)
SELECT "scratch_ftest"."hold", COUNT("scratch_ftest"."id") AS "thecount" FROM "scratch_ftest" WHERE "scratch_ftest"."talk" <= 30 GROUP BY "scratch_ftest"."hold"

To hopefully clear up the report somewhat, the issue is that all annotations are added to the group by list, even if they aren't in the select list. By applying the above patch, we only add the annotations in the select list to the group by list.

I'd like to see a test that had an annotation in the .order_by() clause that wasn't included in .values(), just to ensure we aren't removing too much from the group by.

Thanks for the great report and the fix. If you'd like to open up a pull request with the proposed change and some tests, that'd be very welcome.

comment:2 by Anssi Kääriäinen, 9 years ago

We need to also test that:

FTest.objects.annotate(talking=F('talk')).filter(talking__lte=30).values('hold', 'talking').annotate(thecount=Count('pk')).values('hold')

does use the talking annotation in the group by.

comment:3 by mitchelljkotler, 9 years ago

Description: modified (diff)

I created a pull request, 5002.

It adds one test to test for this issue.

@akaariai, I am not sure I understand what you want to test with that test case.

Note: See TracTickets for help on using tickets.
Back to Top