Opened 6 years ago
Closed 5 years ago
#30246 closed Bug (fixed)
Using an annotated field calculated with django.db.models.functions.Extract in aggregate results in ProgrammingError
Reported by: | Jan Baryła | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.0 |
Severity: | Normal | Keywords: | query, aggregate, extract, annotate, postgresql |
Cc: | Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Aggregating most annotated fields works as expected, but if I put a DateTimeField through Extract during the annotate step, I get a ProgrammingError when trying to aggregate.
models.py
class MyModel(models.Model): usage_time = models.DateTimeField() usage = models.FloatField()
I would like to take the whole queryset, and calculate hourly usages. I figured using the django.db.models.functions.Extract transform would suit my needs well. This is the sample piece of code that, in a perfect scenario, would give me a dictionary filled with key value pairs, where key is the hour, and value is the sum of usages measured in that hour.
hour_aggregates = {} for i in range(24): hour_aggregates['{}_{}'.format("am" if i < 12 else "pm", i)] = Sum("usage", filter=Q(hour=i)) usages = MyModel.objects.annotate(hour=Extract("usage_time", "hour")).aggregate(**hour_aggregates)
Unfortunately, I get the following error:
Traceback (most recent call last): File "/home/jan/project/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute return self.cursor.execute(sql, params) psycopg2.ProgrammingError: column "__col2" does not exist LINE 1: ...CT "package_mymodel"."id" AS Col1, EXTRACT('hour' FROM "__col2" A...
This occured to me while using Django 2.1.7. It doesn't work on 2.2b1, but I have tested this solution on Django 1.8 and it works, which is why I am filing this bug report. My Python version is 3.6.7 and I'm using PostgreSQL 10.6.
Attachments (1)
Change History (13)
comment:1 by , 6 years ago
by , 6 years ago
Attachment: | test_regression.py added |
---|
comment:2 by , 6 years ago
I'm sorry for the confusion - this is my first ticket on your bugtracker. You're right about Django 1.8 - I obviously meant 1.10. And yes, it worked on that version. I've also tested it on the newest 2.2b1 and it doesn't work.
I followed your advice and tried bisecting the change. I quickly found out that the bug is dependent on the database backend - it worked out of the box on the current master with SQLite. After changing the database backend to django.db.backends.postgresql, I managed to reproduce the bug, and then I started the bisection. The commit that introduced the breaking change is b78d100fa62cd4fbbc70f2bae77c192cb36c1ccd.
I have attached the test that I've used. It's very basic, but it pinpoints the issue. All you need to do is put it into tests/aggregation, and change the DATABASES in tests/test_sqlite.py to a Postgres DB.
By the way, thank you for trying to guide me towards a different approach - I'd gladly use it, but what I've posted is just the beginning of a more complicated use case that will need this aggregation to work.
I will try to find the cause of the issue and create a patch if no one else figures it out till then. Thanks for your quick response.
comment:3 by , 6 years ago
Cc: | added |
---|---|
Keywords: | postgresql added |
Triage Stage: | Unreviewed → Accepted |
Thank you for the extra details and the tests, it looks like something is effectively broken with filtered aggregation over annotations when using the FILTER (WHERE)
clause on PostgreSQL since it was introduced then.
I happen to have been working in the area of code dealing with such queries recently to address issues with subqueries (sql.Query.get_aggregation
/rewrite_cols
) and reviewed the PR that added the Aggregate(filter)
support so I'd be happy to help on getting this fixed. Could you give this PR a try against your test and see if it happens to resolve the issue?
Something that would also be useful knowing is the exact bogus query that the ORM generates.
comment:4 by , 6 years ago
Alright so I confirmed the bug lies in rewrite_cols
handling of Lookup
; it should operate on clones of expr
and not expr
directly as they cause a rewrite of previous annotations Col
. In the provided example here the missing "__col2"
is actually the inner query's EXTRACT
reference to friday_night_closing
that gets rewritten to Ref('__col2')
on the first call to rewrite_cols
.
comment:5 by , 6 years ago
Alright, I think you test assertions are wrong but I got the query generated correctly by applying the following patch on top of my aforementioned PR
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index 7ac1b136e9..f61955d53a 100644 --- a/django/db/models/sql/query.py +++ b/django/db/models/sql/query.py @@ -379,22 +379,24 @@ class Query(BaseExpression): # before the contains_aggregate/is_summary condition below. new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) new_exprs.append(new_expr) - elif isinstance(expr, (Col, Subquery)) or (expr.contains_aggregate and not expr.is_summary): - # Reference to column, subquery or another aggregate. Make sure - # the expression is selected and reuse its alias if it's the case. + else: for col_alias, selected_annotation in self.annotation_select.items(): if selected_annotation == expr: + new_expr = Ref(col_alias, expr) break else: - col_cnt += 1 - col_alias = '__col%d' % col_cnt - self.annotations[col_alias] = expr - self.append_annotation_mask([col_alias]) - new_exprs.append(Ref(col_alias, expr)) - else: - # Some other expression not referencing database values - # directly. Its subexpression might contain Cols. - new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) + # Reference to column or another aggregate. Make sure + # the expression is selected and reuse its alias if it's the case. + if isinstance(expr, Col) or (expr.contains_aggregate and not expr.is_summary): + col_cnt += 1 + col_alias = '__col%d' % col_cnt + self.annotations[col_alias] = expr + self.append_annotation_mask([col_alias]) + new_expr = Ref(col_alias, expr) + else: + # Some other expression not referencing database values + # directly. Its subexpression might contain Cols. + new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) new_exprs.append(new_expr) annotation.set_source_expressions(new_exprs) return annotation, col_cnt
comment:6 by , 6 years ago
Has patch: | set |
---|
Alright, I'm pretty confident your issue is resolved by the last commit of https://github.com/django/django/pull/11062
comment:8 by , 6 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
I believe this issue can be closed now.
comment:9 by , 6 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
Not so fast :) The commit in question has not been merged yet.
comment:11 by , 5 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
I encountered the exact (or mostly similar issue) when executing the following test on Postgresql with Django 2.2.x
If you use an annotated value with an F expression, an aggregation with a filter will fail:
aggregate = Company.objects.annotate( nr_empty_chairs=F('num_chairs') - F('num_employees'), ).aggregate( nr_of_companies_with_empty_chairs=Count( 'pk', filter=Q(nr_empty_chairs__gte=1) ) )
This will give with the same programming error:
psycopg2.ProgrammingError: column "__col2" does not exist
I checked this with Django 3.0.x (master branch), and the git history showed me that this was fixed in 1ca825e (the patch above which also adds a test).
But this patch was never backported to Django 2.2
Is there a reason for it? Manually applying this patch on Django 2.2 does solve the issue, so it would be something we want in 2.2 as well :)
comment:12 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Version: | 2.1 → 3.0 |
This patch doesn't qualify for a backport because it's not a regression in Django 2.2 or a security fix.
Hello Jan, thank you for your report.
Just a few questions.
You said the code worked fine on 1.8 but the
Aggregate(filter)
argument was only introduced in Django 2.0 and theExtact
function was only added in Django 1.10. Were you using an explicitCase(When(...))
in Django 1.8? Does using aCase(When(...))
on Django 2.1 happens to work around the issue? If not could you try to bisect the change that broke it.You said you tried against 2.1 but did you try against the current 2.2 beta as well?
Finally I think your misunderstood how to force the ORM to perform group by. I think what you are after is something along the lines of
And perform the hour labeling on the python side.