Opened 11 years ago
Closed 11 years ago
#24233 closed Bug (wontfix)
group-by column construction is broken
| Reported by: | megatrump | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.8alpha1 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
I recently moved to 1.8alpha version from an older version django-e9d1f11 and I started seeing this error. Looks like the way group by is constructed has changed. I am using postgres database.
I could reproduce the behavior with a simple model.
Here is the model that I have.
class everydaycost(models.Model):
cost = models.DecimalField(max_digits=20, decimal_places=2)
rev = models.DecimalField(max_digits=20, decimal_places=2)
ts = models.DateField(db_index=True)
I am trying to get roi (rev/cost) for each day of week.
$ python manage.py shell
Python 2.7.5 (default, Mar 9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>
>>> from testgroupby.models import *
>>> from django.db.models import Q, Count, Min, Sum, Func, FloatField
>>> qs = everydaycost.objects.all()
>>> qs = qs.extra(select={'dow': "TO_CHAR(ts, 'D')"})
>>> qs = qs.values('dow')
>>> qs = qs.annotate(sum_cost=Sum('cost'))
>>> qs = qs.annotate(sum_rev=Sum('rev'))
>>> qs = qs.annotate(roi=Func(A='CAST(Sum("rev") as numeric)', B='CAST(Sum("cost") as numeric)',
... template='ROUND(COALESCE(%(A)s / NULLIF(%(B)s,0), 0), 2)', output_field=FloatField()))
>>> print qs
DEBUG (0.002) SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D')), ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) LIMIT 21; args=()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/Downloads/Django-1.8a1/django/db/models/query.py", line 139, in __repr__
data = list(self[:REPR_OUTPUT_SIZE + 1])
File "/Downloads/Django-1.8a1/django/db/models/query.py", line 163, in __iter__
self._fetch_all()
File "/Downloads/Django-1.8a1/django/db/models/query.py", line 955, in _fetch_all
self._result_cache = list(self.iterator())
File "/Downloads/Django-1.8a1/django/db/models/query.py", line 1075, in iterator
for row in self.query.get_compiler(self.db).results_iter():
File "/Downloads/Django-1.8a1/django/db/models/sql/compiler.py", line 780, in results_iter
results = self.execute_sql(MULTI)
File "/Downloads/Django-1.8a1/django/db/models/sql/compiler.py", line 826, in execute_sql
cursor.execute(sql, params)
File "/Downloads/Django-1.8a1/django/db/backends/utils.py", line 80, in execute
return super(CursorDebugWrapper, self).execute(sql, params)
File "/Downloads/Django-1.8a1/django/db/backends/utils.py", line 65, in execute
return self.cursor.execute(sql, params)
File "/Downloads/Django-1.8a1/django/db/utils.py", line 95, in __exit__
six.reraise(dj_exc_type, dj_exc_value, traceback)
File "/Downloads/Django-1.8a1/django/db/backends/utils.py", line 65, in execute
return self.cursor.execute(sql, params)
ProgrammingError: aggregate functions are not allowed in GROUP BY
LINE 1: ... GROUP BY (TO_CHAR(ts, 'D')), ROUND(COALESCE(CAST(Sum("rev")...
^
>>> print qs.query
SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D')), ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2)
>>>
With earlier version, this worked fine for me.
$ python manage.py shell
Python 2.7.5 (default, Mar 9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>>
>>> from testgroupby.models import *
>>> from django.db.models import Q, Count, Min, Sum, Func, FloatField
>>> qs = everydaycost.objects.all()
>>> qs = qs.extra(select={'dow': "TO_CHAR(ts, 'D')"})
>>> qs = qs.values('dow')
>>> qs = qs.annotate(sum_cost=Sum('
cost'))
>>> qs = qs.annotate(sum_rev=Sum('rev'))
>>> qs = qs.annotate(roi=Func(A='CAST(Sum("rev") as numeric)', B='CAST(Sum("cost") as numeric)',
... template='ROUND(COALESCE(%(A)s / NULLIF(%(B)s,0), 0), 2)', output_field=FloatField()))
>>> print qs
DEBUG (0.002) SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D')) LIMIT 21; args=()
[]
>>> print qs.query
SELECT (TO_CHAR(ts, 'D')) AS "dow", SUM("testgroupby_everydaycost"."cost") AS "sum_cost", SUM("testgroupby_everydaycost"."rev") AS "sum_rev", ROUND(COALESCE(CAST(Sum("rev") as numeric) / NULLIF(CAST(Sum("cost") as numeric),0), 0), 2) AS "roi" FROM "testgroupby_everydaycost" GROUP BY (TO_CHAR(ts, 'D'))
>>>
Change History (1)
comment:1 by , 11 years ago
| Resolution: | → wontfix |
|---|---|
| Status: | new → closed |
| Triage Stage: | Unreviewed → Accepted |
Note:
See TracTickets
for help on using tickets.
Yes, I believe this change was introduced with this PR https://github.com/django/django/pull/3669, but it may have been another. Grouping was pretty broken when using complicated expressions, because the backing columns were being added to the GROUP BY rather than the entire expression. It was lucky you were getting the right results.
Now, the reason this is failing for you now, is because you're effectively writing RawSQL (there's a private expression type for this), without any of the benefits of new expressions. You may as well be writing
.extra()calls. To fully leverage the power of expressions, you should use the smaller types to build up bigger ones. Expressions know that if they reference an aggregate, then they shouldn't be added to the group by clause. But you've just stuck a raw SUM expression into the roi calculation, and Django can't know that it's now referencing an aggregate.I've replicated (mostly..) your query only using full expressions. I cheated with a couple (using Func(function='..')) rather than building out the types fully, but they'll serve for now. I've broken each part down into logical chunks, and have printed the underlying query at each stage to make this easier to follow.
First, annotate with the day of week. Annotate can do everything extra can do with regards to adding columns to the select list:
In [30]: qs = Costing.objects.annotate(dow=Func(F('ts'), Value('D'), function='TO_CHAR')).values('dow') In [31]: print(qs.query) SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow" FROM "scratch_costing"Next, add the aggregates we're interested in:
In [32]: qs = qs.annotate(sum_cost=Sum('cost'), sum_rev=Sum('rev')) In [33]: print(qs.query) SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow", SUM("scratch_costing"."cost") AS "sum_cost", SUM("scratch_costing"."rev") AS "sum_rev" FROM "scratch_costing" GROUP BY TO_CHAR("scratch_costing"."ts", D)Now, the ROI calculation. Note, this could be improved by using Case/When expressions
In [34]: qsroi = qs.annotate(roi=Func( Coalesce(F('sum_rev') / Coalesce('sum_cost', Value(0)), Value(0)) , Value(2), function='ROUND', output_field=FloatField())) In [35]: print(qsroi.query) SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow", SUM("scratch_costing"."cost") AS "sum_cost", SUM("scratch_costing"."rev") AS "sum_rev", ROUND(COALESCE((SUM("scratch_costing"."rev") / COALESCE(SUM("scratch_costing"."cost"), 0)), 0), 2) AS "roi" FROM "scratch_costing" GROUP BY TO_CHAR("scratch_costing"."ts", D) In [36]: roi = qsroi[0] In [37]: print(roi) {'roi': 0.54, 'sum_cost': Decimal('30.30'), 'sum_rev': Decimal('16.50'), 'dow': u'4'}And a quick Case/When example, to show how you could avoid doing division by NULL or 0:
In [39]: class Round(Func): ....: function = 'ROUND' ....: In [40]: qsroi = qs.annotate(roi=Round(Case(When(Q(sum_cost__isnull=True)|Q(sum_cost=0), then='sum_rev'), default=F('sum_rev')/F('sum_cost')), Value(2), output_field=FloatField())) In [41]: print(qsroi[0]) {'roi': 0.54, 'sum_cost': Decimal('30.30'), 'sum_rev': Decimal('16.50'), 'dow': u'4'} In [42]: print(qsroi.query) SELECT TO_CHAR("scratch_costing"."ts", D) AS "dow", SUM("scratch_costing"."cost") AS "sum_cost", SUM("scratch_costing"."rev") AS "sum_rev", ROUND(CASE WHEN (SUM("scratch_costing"."cost") IS NULL OR SUM("scratch_costing"."cost") = 0) THEN SUM("scratch_costing"."rev") ELSE (SUM("scratch_costing"."rev") / SUM("scratch_costing"."cost")) END, 2) AS "roi" FROM "scratch_costing" GROUP BY TO_CHAR("scratch_costing"."ts", D)I've probably put a little too much detail above, but hopefully it'll help you solve your problem in a supported way. I'm going to close this as wontfix, because the types aren't being used in the right way. If the documentation for the expressions or function types needs to be improved, please reopen with some comments on where the clarification is required, and I'll work on updating them to make it clearer.