Opened 4 years ago

Closed 4 years ago

#31136 closed Bug (fixed)

Multiple annotations with Exists() should not group by aliases.

Reported by: Sigurd Ljødal Owned by: Mariusz Felisiak
Component: Database layer (models, ORM) Version: 3.0
Severity: Release blocker Keywords:
Cc: Simon Charette, jon.dufresne@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There's a regression in Django 3.0 where an annotation that is not selected is included in the group by section of the query on PostgreSQL. I looks a bit like https://code.djangoproject.com/ticket/31109, but it is still present in Django 3.0.2 (I haven't tested earlier versions yet). I haven't had time to look into where the incorrect group by cause is generated yet.

I have created a test case to reproduce the issue here: https://github.com/django/django/compare/master...ljodal:regression/annotate-groupby-postgresql

It only crashes when running against PostgreSQL. Here's the test case query:

latest_book_pubdate_qs = Book.objects.filter(
    publisher=OuterRef('pk')
).values_list('id')
publisher_qs = Publisher.objects.values('pk').annotate(
    has_published_books=Exists(latest_book_pubdate_qs),
).filter(
    has_published_books=True
).annotate(
    book_count=Count('book')
).values_list('id')

The generated query is this:

SELECT "aggregation_publisher"."id" FROM "aggregation_publisher" LEFT OUTER JOIN "aggregation_book" ON ("aggregation_publisher"."id" = "aggregation_book"."publisher_id") WHERE EXISTS(SELECT U0."id" FROM "aggregation_book" U0 WHERE U0."publisher_id" = "aggregation_publisher"."id") GROUP BY "aggregation_publisher"."id", "has_published_books"

Which crashes because "has_published_books" in the "GROUP BY" clause is not a column on the table:

Traceback (most recent call last):
  File "/Users/sigurdljodal/Code/django/tests/aggregation/tests.py", line 1166, in test_aggregation_annotation
    print(list(publisher_qs))
  File "/Users/sigurdljodal/Code/django/django/db/models/query.py", line 270, in __len__
    self._fetch_all()
  File "/Users/sigurdljodal/Code/django/django/db/models/query.py", line 1284, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/Users/sigurdljodal/Code/django/django/db/models/query.py", line 144, in __iter__
    return compiler.results_iter(tuple_expected=True, chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/Users/sigurdljodal/Code/django/django/db/models/sql/compiler.py", line 1085, in results_iter
    results = self.execute_sql(MULTI, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
  File "/Users/sigurdljodal/Code/django/django/db/models/sql/compiler.py", line 1133, in execute_sql
    cursor.execute(sql, params)
  File "/Users/sigurdljodal/Code/django/django/db/backends/utils.py", line 66, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/Users/sigurdljodal/Code/django/django/db/backends/utils.py", line 75, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/Users/sigurdljodal/Code/django/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/Users/sigurdljodal/Code/django/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/Users/sigurdljodal/Code/django/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column "has_published_books" does not exist
LINE 1: ...her"."id") GROUP BY "aggregation_publisher"."id", "has_publi...

Change History (15)

comment:1 by Mariusz Felisiak, 4 years ago

Cc: Simon Charette added
Severity: NormalRelease blocker
Summary: Group by with annotation regression on PostgreSQLMultiple annotations with Exists() should not group by aliases.
Triage Stage: UnreviewedAccepted

Thanks for this report. As a temporary workaround you can use Exists() directly in QuerySet.filter() (without an annotation), i.e.

Publisher.objects.values('pk').filter(
    Exists(latest_book_pubdate_qs)
).annotate(
    book_count=Count('book')
).values_list('id')

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.
Reproduced at 495d7a1ddf3c8c83d49f099ee3599afbc28306fe.

comment:2 by Mariusz Felisiak, 4 years ago

Owner: changed from nobody to Mariusz Felisiak
Status: newassigned

comment:3 by Mariusz Felisiak, 4 years ago

Has patch: set

comment:4 by GitHub <noreply@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 0f843fdd:

Fixed #31136 -- Disabled grouping by aliases on QuerySet.values()/values_list().

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Thanks Sigurd Ljødal for the report.

comment:5 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 4f81f6d:

[3.0.x] Fixed #31136 -- Disabled grouping by aliases on QuerySet.values()/values_list().

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Thanks Sigurd Ljødal for the report.
Backport of 0f843fdd5b9b2f2307148465cd60f4e1b2befbb4 from master

comment:6 by Mariusz Felisiak, 4 years ago

Resolution: fixed
Status: closednew

I found a regression in this fix when I was trying to reproduce #31150, see

diff --git a/tests/aggregation/tests.py b/tests/aggregation/tests.py
index f523b3ca35..0209a9221b 100644
--- a/tests/aggregation/tests.py
+++ b/tests/aggregation/tests.py
@@ -1165,6 +1165,27 @@ class AggregateTestCase(TestCase):
             'Sams',
         ])
 
+    def test_aggregation_subquery_annotation_distinct(self):
+        books_qs = Book.objects.annotate(
+            first_author_the_same_age=Subquery(
+                Author.objects.filter(
+                    age=OuterRef('contact__friends__age'),
+                ).order_by('age').values('id')[:1],
+            )
+        ).filter(
+            publisher=self.p1,
+            first_author_the_same_age__isnull=False,
+        ).annotate(
+            min_age=Min('contact__friends__age'),
+        ).values('name', 'min_age').distinct()
+        self.assertEqual(list(books_qs),[
+            {
+                'name': 'The Definitive Guide to Django: Web Development Done Right',
+                'min_age': 29,
+            },
+            {'name': 'Practical Django Projects', 'min_age': 34},
+        ])
+
     @skipUnlessDBFeature('supports_subqueries_in_group_by')
     def test_group_by_subquery_annotation(self):
         """

comment:7 by Mariusz Felisiak, 4 years ago

comment:8 by Carlton Gibson, 4 years ago

Triage Stage: AcceptedReady for checkin

comment:9 by GitHub <noreply@…>, 4 years ago

In 59b4e99:

Refs #31136 -- Made QuerySet.values()/values_list() group only by selected annotation.

Regression in 0f843fdd5b9b2f2307148465cd60f4e1b2befbb4.

comment:10 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In a246869:

[3.0.x] Refs #31136 -- Made QuerySet.values()/values_list() group only by selected annotation.

Regression in 0f843fdd5b9b2f2307148465cd60f4e1b2befbb4.
Backport of 59b4e99dd00b9c36d56055b889f96885995e4240 from master

comment:11 by Mariusz Felisiak, 4 years ago

Resolution: fixed
Status: newclosed

comment:12 by Jon Dufresne, 4 years ago

Cc: jon.dufresne@… added

Commit a24686987f21fe6b5339cc13607c13fa906f81a8 causes a regression for my application. I'll try to pin down a minimal example but for now, here is the backtrace:

Traceback (most recent call last):
  <MY APP>
  File ".../django/db/models/query.py", line 276, in __iter__
    self._fetch_all()
  File ".../django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File ".../django/db/models/query.py", line 184, in __iter__
    for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size):
  File ".../django/db/models/sql/compiler.py", line 1096, in results_iter
    results = self.execute_sql(MULTI, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
  File ".../django/db/models/sql/compiler.py", line 1144, in execute_sql
    cursor.execute(sql, params)
  File ".../django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File ".../django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File ".../django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File ".../django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File ".../django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column "myapp_mymodel.myfield" must appear in the GROUP BY clause or be used in an aggregate function

comment:13 by Jon Dufresne, 4 years ago

Resolution: fixed
Status: closednew

Here is a POC test to demonstrate the regression. Fails with the postgresql backend.

diff --git a/tests/aggregation/models.py b/tests/aggregation/models.py
index fd441fe51d..211a96d97c 100644
--- a/tests/aggregation/models.py
+++ b/tests/aggregation/models.py
@@ -42,3 +42,17 @@ class Store(models.Model):
 
     def __str__(self):
         return self.name
+
+
+class ThisOrThat(models.Model):
+    pass
+
+
+class This(models.Model):
+    thisorthat = models.ForeignKey("ThisOrThat", models.CASCADE)
+    created_at = models.DateTimeField(auto_now_add=True)
+
+
+class That(models.Model):
+    thisorthat = models.OneToOneField("ThisOrThat", models.CASCADE)
+    created_at = models.DateTimeField(auto_now_add=True)
diff --git a/tests/aggregation/tests.py b/tests/aggregation/tests.py
index bef415abd4..508b9cf687 100644
--- a/tests/aggregation/tests.py
+++ b/tests/aggregation/tests.py
@@ -1239,3 +1239,20 @@ class AggregateTestCase(TestCase):
             contact_publisher__isnull=False,
         ).annotate(count=Count('authors'))
         self.assertSequenceEqual(books_qs, [book])
+
+    def test_regression(self):
+        from .models import ThisOrThat
+        from django.db.models.aggregates import Max
+        from django.db.models.functions import Coalesce
+
+        qs = (
+            ThisOrThat.objects.annotate(
+                Max("this__created_at"),
+                created_at=Coalesce(
+                    "this__created_at__max", "that__created_at"
+                ),
+            )
+            .order_by("created_at")
+            .values_list("pk", flat=True)
+        )
+        list(qs)

Running this test results in the following error:

======================================================================
ERROR: test_regression (aggregation.tests.AggregateTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jon/devel/django/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.GroupingError: column "aggregation_that.created_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...BY COALESCE(MAX("aggregation_this"."created_at"), "aggregati...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jon/devel/django/tests/aggregation/tests.py", line 1258, in test_regression
    list(qs)
  File "/home/jon/devel/django/django/db/models/query.py", line 258, in __len__
    self._fetch_all()
  File "/home/jon/devel/django/django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/jon/devel/django/django/db/models/query.py", line 184, in __iter__
    for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size):
  File "/home/jon/devel/django/django/db/models/sql/compiler.py", line 1096, in results_iter
    results = self.execute_sql(MULTI, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
  File "/home/jon/devel/django/django/db/models/sql/compiler.py", line 1144, in execute_sql
    cursor.execute(sql, params)
  File "/home/jon/devel/django/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/jon/devel/django/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/jon/devel/django/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/jon/devel/django/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/jon/devel/django/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column "aggregation_that.created_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...BY COALESCE(MAX("aggregation_this"."created_at"), "aggregati...

comment:14 by Simon Charette, 4 years ago

Jon I think you should create another ticket referring this one instead of reopening this one.

Looks like something is off with contains_aggregate handling in a24686987f21fe6b5339cc13607c13fa906f81a8.

Can you also reproduce the crash by chaining annotate calls? Could you provide the query before and after a24686987f21fe6b5339cc13607c13fa906f81a8.

comment:15 by Mariusz Felisiak, 4 years ago

Resolution: fixed
Status: newclosed

I've create a new ticket #31217 for this regression.

Note: See TracTickets for help on using tickets.
Back to Top