Opened 9 years ago

Closed 9 years ago

#24233 closed Bug (wontfix)

group-by column construction is broken

Reported by: megatrump Owned by: nobody
Component: Database layer (models, ORM) Version: 1.8alpha1
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I recently moved to 1.8alpha version from an older version django-e9d1f11 and I started seeing this error. Looks like the way group by is constructed has changed. I am using postgres database.

I could reproduce the behavior with a simple model.
Here is the model that I have.

class everydaycost(models.Model):
    cost = models.DecimalField(max_digits=20, decimal_places=2)
    rev = models.DecimalField(max_digits=20, decimal_places=2)
    ts = models.DateField(db_index=True)

I am trying to get roi (rev/cost) for each day of week.

$ python manage.py shell
Python 2.7.5 (default, Mar  9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>
>>> from testgroupby.models import *
>>> from django.db.models import Q, Count, Min, Sum, Func, FloatField
>>> qs = everydaycost.objects.all()
>>> qs = qs.extra(select={'dow': "TO_CHAR(ts, 'D')"})
>>> qs = qs.values('dow')
>>> qs = qs.annotate(sum_cost=Sum('cost'))
>>> qs = qs.annotate(sum_rev=Sum('rev'))
>>> qs = qs.annotate(roi=Func(A='CAST(Sum("rev") as numeric)', B='CAST(Sum("cost") as numeric)',
...                 template='ROUND(COALESCE(%(A)s / NULLIF(%(B)s,0), 0), 2)', output_field=FloatField()))
>>> print qs
DEBUG    (0.002) SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D')), ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) LIMIT 21; args=()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/Downloads/Django-1.8a1/django/db/models/query.py", line 139, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/Downloads/Django-1.8a1/django/db/models/query.py", line 163, in __iter__
    self._fetch_all()
  File "/Downloads/Django-1.8a1/django/db/models/query.py", line 955, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/Downloads/Django-1.8a1/django/db/models/query.py", line 1075, in iterator
    for row in self.query.get_compiler(self.db).results_iter():
  File "/Downloads/Django-1.8a1/django/db/models/sql/compiler.py", line 780, in results_iter
    results = self.execute_sql(MULTI)
  File "/Downloads/Django-1.8a1/django/db/models/sql/compiler.py", line 826, in execute_sql
    cursor.execute(sql, params)
  File "/Downloads/Django-1.8a1/django/db/backends/utils.py", line 80, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "/Downloads/Django-1.8a1/django/db/backends/utils.py", line 65, in execute
    return self.cursor.execute(sql, params)
  File "/Downloads/Django-1.8a1/django/db/utils.py", line 95, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/Downloads/Django-1.8a1/django/db/backends/utils.py", line 65, in execute
    return self.cursor.execute(sql, params)
ProgrammingError: aggregate functions are not allowed in GROUP BY
LINE 1: ... GROUP BY (TO_CHAR(ts, 'D')), ROUND(COALESCE(CAST(Sum("rev")...
                                                             ^

>>> print qs.query
SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D')), ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2)
>>>

With earlier version, this worked fine for me.

$ python manage.py shell
Python 2.7.5 (default, Mar  9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>
>>> from testgroupby.models import *
>>> from django.db.models import Q, Count, Min, Sum, Func, FloatField
>>> qs = everydaycost.objects.all()
>>> qs = qs.extra(select={'dow': "TO_CHAR(ts, 'D')"})
>>> qs = qs.values('dow')
>>> qs = qs.annotate(sum_cost=Sum('
cost'))
>>> qs = qs.annotate(sum_rev=Sum('rev'))
>>> qs = qs.annotate(roi=Func(A='CAST(Sum("rev") as numeric)', B='CAST(Sum("cost") as numeric)',
...                 template='ROUND(COALESCE(%(A)s / NULLIF(%(B)s,0), 0), 2)', output_field=FloatField()))
>>> print qs
DEBUG    (0.002) SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D')) LIMIT 21; args=()
[]
>>> print qs.query
SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D'))
>>>

Change History (1)

comment:1 by Josh Smeaton, 9 years ago

Resolution: wontfix
Status: newclosed
Triage Stage: UnreviewedAccepted

Yes, I believe this change was introduced with this PR https://github.com/django/django/pull/3669, but it may have been another. Grouping was pretty broken when using complicated expressions, because the backing columns were being added to the GROUP BY rather than the entire expression. It was lucky you were getting the right results.

Now, the reason this is failing for you now, is because you're effectively writing RawSQL (there's a private expression type for this), without any of the benefits of new expressions. You may as well be writing .extra() calls. To fully leverage the power of expressions, you should use the smaller types to build up bigger ones. Expressions know that if they reference an aggregate, then they shouldn't be added to the group by clause. But you've just stuck a raw SUM expression into the roi calculation, and Django can't know that it's now referencing an aggregate.

I've replicated (mostly..) your query only using full expressions. I cheated with a couple (using Func(function='..')) rather than building out the types fully, but they'll serve for now. I've broken each part down into logical chunks, and have printed the underlying query at each stage to make this easier to follow.

First, annotate with the day of week. Annotate can do everything extra can do with regards to adding columns to the select list:

In [30]: qs = Costing.objects.annotate(dow=Func(F('ts'), Value('D'), function='TO_CHAR')).values('dow')

In [31]: print(qs.query)
SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow" FROM "scratch_costing"

Next, add the aggregates we're interested in:

In [32]: qs = qs.annotate(sum_cost=Sum('cost'), sum_rev=Sum('rev'))

In [33]: print(qs.query)
SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow", SUM("scratch_costing"."cost") AS "sum_cost", SUM("scratch_costing"."rev") AS "sum_rev" FROM "scratch_costing" GROUP BY TO_CHAR("scratch_costing"."ts", D)

Now, the ROI calculation. Note, this could be improved by using Case/When expressions

In [34]: qsroi = qs.annotate(roi=Func( Coalesce(F('sum_rev') / Coalesce('sum_cost', Value(0)), Value(0)) , Value(2), function='ROUND', output_field=FloatField()))

In [35]: print(qsroi.query)
SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow", SUM("scratch_costing"."cost") AS "sum_cost", SUM("scratch_costing"."rev") AS "sum_rev", ROUND(COALESCE((SUM("scratch_costing"."rev") / COALESCE(SUM("scratch_costing"."cost"), 0)), 0), 2) AS "roi" FROM "scratch_costing" GROUP BY TO_CHAR("scratch_costing"."ts", D)

In [36]: roi = qsroi[0]
In [37]: print(roi)
{'roi': 0.54, 'sum_cost': Decimal('30.30'), 'sum_rev': Decimal('16.50'), 'dow': u'4'}

And a quick Case/When example, to show how you could avoid doing division by NULL or 0:

In [39]: class Round(Func):
   ....:     function = 'ROUND'
   ....:

In [40]: qsroi = qs.annotate(roi=Round(Case(When(Q(sum_cost__isnull=True)|Q(sum_cost=0), then='sum_rev'), default=F('sum_rev')/F('sum_cost')), Value(2), output_field=FloatField()))

In [41]: print(qsroi[0])
{'roi': 0.54, 'sum_cost': Decimal('30.30'), 'sum_rev': Decimal('16.50'), 'dow': u'4'}

In [42]: print(qsroi.query)
SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow", SUM("scratch_costing"."cost") AS "sum_cost", SUM("scratch_costing"."rev") AS "sum_rev", ROUND(CASE WHEN (SUM("scratch_costing"."cost") IS NULL OR SUM("scratch_costing"."cost") = 0) THEN SUM("scratch_costing"."rev") ELSE (SUM("scratch_costing"."rev") / SUM("scratch_costing"."cost")) END, 2) AS "roi" FROM "scratch_costing" GROUP BY TO_CHAR("scratch_costing"."ts", D)

I've probably put a little too much detail above, but hopefully it'll help you solve your problem in a supported way. I'm going to close this as wontfix, because the types aren't being used in the right way. If the documentation for the expressions or function types needs to be improved, please reopen with some comments on where the clarification is required, and I'll work on updating them to make it clearer.

Note: See TracTickets for help on using tickets.
Back to Top