Opened 5 years ago

Closed 4 years ago

#30246 closed Bug (fixed)

Using an annotated field calculated with django.db.models.functions.Extract in aggregate results in ProgrammingError

Reported by: Jan Baryła Owned by: nobody
Component: Database layer (models, ORM) Version: 3.0
Severity: Normal Keywords: query, aggregate, extract, annotate, postgresql
Cc: Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Aggregating most annotated fields works as expected, but if I put a DateTimeField through Extract during the annotate step, I get a ProgrammingError when trying to aggregate.

models.py

class MyModel(models.Model):
    usage_time = models.DateTimeField()
    usage = models.FloatField()

I would like to take the whole queryset, and calculate hourly usages. I figured using the django.db.models.functions.Extract transform would suit my needs well. This is the sample piece of code that, in a perfect scenario, would give me a dictionary filled with key value pairs, where key is the hour, and value is the sum of usages measured in that hour.

    hour_aggregates = {}
    for i in range(24):
        hour_aggregates['{}_{}'.format("am" if i < 12 else "pm", i)] = Sum("usage", filter=Q(hour=i))
    usages = MyModel.objects.annotate(hour=Extract("usage_time", "hour")).aggregate(**hour_aggregates)

Unfortunately, I get the following error:

Traceback (most recent call last):
  File "/home/jan/project/env/lib/python3.6/site-packages/django/db/backends/utils.py", line 85, in _execute
    return self.cursor.execute(sql, params)
psycopg2.ProgrammingError: column "__col2" does not exist
LINE 1: ...CT "package_mymodel"."id" AS Col1, EXTRACT('hour' FROM "__col2" A...

This occured to me while using Django 2.1.7. It doesn't work on 2.2b1, but I have tested this solution on Django 1.8 and it works, which is why I am filing this bug report. My Python version is 3.6.7 and I'm using PostgreSQL 10.6.

Attachments (1)

test_regression.py (3.9 KB ) - added by Jan Baryła 5 years ago.

Download all attachments as: .zip

Change History (13)

comment:1 by Simon Charette, 5 years ago

Hello Jan, thank you for your report.

Just a few questions.

You said the code worked fine on 1.8 but the Aggregate(filter) argument was only introduced in Django 2.0 and the Extact function was only added in Django 1.10. Were you using an explicit Case(When(...)) in Django 1.8? Does using a Case(When(...)) on Django 2.1 happens to work around the issue? If not could you try to bisect the change that broke it.

You said you tried against 2.1 but did you try against the current 2.2 beta as well?

Finally I think your misunderstood how to force the ORM to perform group by. I think what you are after is something along the lines of

MyModel.objects.values('usage_time__hour').annotate(Sum('usage'))

And perform the hour labeling on the python side.

by Jan Baryła, 5 years ago

Attachment: test_regression.py added

comment:2 by Jan Baryła, 5 years ago

I'm sorry for the confusion - this is my first ticket on your bugtracker. You're right about Django 1.8 - I obviously meant 1.10. And yes, it worked on that version. I've also tested it on the newest 2.2b1 and it doesn't work.

I followed your advice and tried bisecting the change. I quickly found out that the bug is dependent on the database backend - it worked out of the box on the current master with SQLite. After changing the database backend to django.db.backends.postgresql, I managed to reproduce the bug, and then I started the bisection. The commit that introduced the breaking change is b78d100fa62cd4fbbc70f2bae77c192cb36c1ccd.

I have attached the test that I've used. It's very basic, but it pinpoints the issue. All you need to do is put it into tests/aggregation, and change the DATABASES in tests/test_sqlite.py to a Postgres DB.

By the way, thank you for trying to guide me towards a different approach - I'd gladly use it, but what I've posted is just the beginning of a more complicated use case that will need this aggregation to work.

I will try to find the cause of the issue and create a patch if no one else figures it out till then. Thanks for your quick response.

comment:3 by Simon Charette, 5 years ago

Cc: Simon Charette added
Keywords: postgresql added
Triage Stage: UnreviewedAccepted

Thank you for the extra details and the tests, it looks like something is effectively broken with filtered aggregation over annotations when using the FILTER (WHERE) clause on PostgreSQL since it was introduced then.

I happen to have been working in the area of code dealing with such queries recently to address issues with subqueries (sql.Query.get_aggregation/rewrite_cols) and reviewed the PR that added the Aggregate(filter) support so I'd be happy to help on getting this fixed. Could you give this PR a try against your test and see if it happens to resolve the issue?

Something that would also be useful knowing is the exact bogus query that the ORM generates.

comment:4 by Simon Charette, 5 years ago

Alright so I confirmed the bug lies in rewrite_cols handling of Lookup; it should operate on clones of expr and not expr directly as they cause a rewrite of previous annotations Col. In the provided example here the missing "__col2" is actually the inner query's EXTRACT reference to friday_night_closing that gets rewritten to Ref('__col2') on the first call to rewrite_cols.

comment:5 by Simon Charette, 5 years ago

Alright, I think you test assertions are wrong but I got the query generated correctly by applying the following patch on top of my aforementioned PR

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index 7ac1b136e9..f61955d53a 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -379,22 +379,24 @@ class Query(BaseExpression):
                 # before the contains_aggregate/is_summary condition below.
                 new_expr, col_cnt = self.rewrite_cols(expr, col_cnt)
                 new_exprs.append(new_expr)
-            elif isinstance(expr, (Col, Subquery)) or (expr.contains_aggregate and not expr.is_summary):
-                # Reference to column, subquery or another aggregate. Make sure
-                # the expression is selected and reuse its alias if it's the case.
+            else:
                 for col_alias, selected_annotation in self.annotation_select.items():
                     if selected_annotation == expr:
+                        new_expr = Ref(col_alias, expr)
                         break
                 else:
-                    col_cnt += 1
-                    col_alias = '__col%d' % col_cnt
-                    self.annotations[col_alias] = expr
-                    self.append_annotation_mask([col_alias])
-                new_exprs.append(Ref(col_alias, expr))
-            else:
-                # Some other expression not referencing database values
-                # directly. Its subexpression might contain Cols.
-                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt)
+                    # Reference to column or another aggregate. Make sure
+                    # the expression is selected and reuse its alias if it's the case.
+                    if isinstance(expr, Col) or (expr.contains_aggregate and not expr.is_summary):
+                        col_cnt += 1
+                        col_alias = '__col%d' % col_cnt
+                        self.annotations[col_alias] = expr
+                        self.append_annotation_mask([col_alias])
+                        new_expr = Ref(col_alias, expr)
+                    else:
+                        # Some other expression not referencing database values
+                        # directly. Its subexpression might contain Cols.
+                        new_expr, col_cnt = self.rewrite_cols(expr, col_cnt)
                 new_exprs.append(new_expr)
         annotation.set_source_expressions(new_exprs)
         return annotation, col_cnt

comment:6 by Simon Charette, 5 years ago

Has patch: set

Alright, I'm pretty confident your issue is resolved by the last commit of https://github.com/django/django/pull/11062

comment:7 by Jan Baryła, 5 years ago

I can confirm that Simon's PR fixes the issue. Great job!

comment:8 by Matthew Schinckel, 5 years ago

Resolution: fixed
Status: newclosed

I believe this issue can be closed now.

comment:9 by Simon Charette, 5 years ago

Resolution: fixed
Status: closednew

Not so fast :) The commit in question has not been merged yet.

comment:10 by Tim Graham <timograham@…>, 5 years ago

Resolution: fixed
Status: newclosed

In 1ca825e:

Fixed #30246 -- Reused annotation aliases references in aggregation filters.

Thanks Jan Baryła for the detailed report and the reduced test case.

comment:11 by Martijn Jacobs, 4 years ago

Resolution: fixed
Status: closednew

I encountered the exact (or mostly similar) issue when executing the following test on Postgresql with Django 2.2.x

If you use an annotated value with an F expression, an aggregation with a filter will fail:

aggregate = Company.objects.annotate(
    nr_empty_chairs=F('num_chairs') - F('num_employees'),
).aggregate(
    nr_of_companies_with_empty_chairs=Count(
        'pk', filter=Q(nr_empty_chairs__gte=1)
    )
)

This will give with the same programming error:

psycopg2.ProgrammingError: column "__col2" does not exist

I checked this with Django 3.0.x (master branch), and the git history showed me that this was fixed in 1ca825e (the patch above which also adds a test).

But this patch was never backported to Django 2.2

Is there a reason for it? Manually applying this patch on Django 2.2 does solve the issue, so it would be something we want in 2.2 as well :)

Last edited 4 years ago by Martijn Jacobs (previous) (diff)

comment:12 by Mariusz Felisiak, 4 years ago

Resolution: fixed
Status: newclosed
Version: 2.13.0

This patch doesn't qualify for a backport because it's not a regression in Django 2.2 or a security fix.

Note: See TracTickets for help on using tickets.
Back to Top