Opened 3 years ago

Last modified 3 years ago

#32478 closed Bug

Queryset annotation mixing aggregate and subquery doesn't GROUP BY outer column references. — at Version 5

Reported by: Igor Pejic Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 3.0
Severity: Normal Keywords: outerref, subquery
Cc: Igor Pejic Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Igor Pejic)

After trying to migrate from 2.2. to 3.0, we experience a problem with the translation of queries in PostgreSQL which used to work in 2.2.

A breaking test is shown in: https://github.com/igorpejic/django/pull/1/files

From investigation this is related to: https://code.djangoproject.com/ticket/31094
and it seems the regression was introduced in: https://github.com/django/django/commit/fb3f034f1c63160c0ff13c609acd01c18be12f80

It seems that the combination of the "double OuterRef" with the "Case" logic is causing the problem.

Query looks like:

        books_with_same_name_as_country = Book.objects.filter(
            id__in=Subquery(
                Book.objects.filter(
                    name=OuterRef(OuterRef('country__name')),
                ).values('id')
            )
        ).values('id')[:1]
        books_breakdown = Publisher.objects.annotate(total_books=Case(
            When(
                num_awards__gte=2,
                then=Subquery(books_with_same_name_as_country, IntegerField())
            ),
            When(
                num_awards__lt=0,
                then=Count('country__publishers')
            ),
        ))

Stack trace:

======================================================================
ERROR: test_group_by_subquery_annotation_with_conditional (aggregation.tests.AggregateTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.GroupingError: subquery uses ungrouped column "aggregation_country.name" from outer query
LINE 1: ..."id" FROM "aggregation_book" U0 WHERE U0."name" = "aggregati...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/igor/django-repo/django/test/testcases.py", line 1229, in skip_wrapper
    return test_func(*args, **kwargs)
  File "/home/igor/django-repo/tests/aggregation/tests.py", line 1299, in test_group_by_subquery_annotation_with_conditional
    self.assertEqual(books_breakdown.get(id=self.p1.id).total_books, 1)
  File "/home/igor/django-repo/django/db/models/query.py", line 411, in get
    num = len(clone)
  File "/home/igor/django-repo/django/db/models/query.py", line 258, in __len__
    self._fetch_all()
  File "/home/igor/django-repo/django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/igor/django-repo/django/db/models/query.py", line 57, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/home/igor/django-repo/django/db/models/sql/compiler.py", line 1154, in execute_sql
    cursor.execute(sql, params)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/igor/django-repo/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: subquery uses ungrouped column "aggregation_country.name" from outer query
LINE 1: ..."id" FROM "aggregation_book" U0 WHERE U0."name" = "aggregati...
                                                             ^

Full query 2.2:

SELECT "aggregation_regress_publisher"."id",
       "aggregation_regress_publisher"."name",
       "aggregation_regress_publisher"."num_awards",
       "aggregation_regress_publisher"."country_id",
       CASE
           WHEN "aggregation_regress_publisher"."num_awards" >= 2 THEN
                  (SELECT V0."id"
                   FROM "aggregation_regress_book" V0
                   WHERE V0."id" IN
                       (SELECT U0."id"
                        FROM "aggregation_regress_book" U0
                        WHERE U0."name" = ("aggregation_regress_country"."name")
                        ORDER BY U0."name" ASC)
                   ORDER BY V0."name" ASC
                   LIMIT 1)
           WHEN "aggregation_regress_publisher"."num_awards" < 0 THEN COUNT(T3."id")
           ELSE NULL
       END AS "total_books"
FROM "aggregation_regress_publisher"
INNER JOIN "aggregation_regress_country" ON ("aggregation_regress_publisher"."country_id" = "aggregation_regress_country"."id")
LEFT OUTER JOIN "aggregation_regress_publisher" T3 ON ("aggregation_regress_country"."id" = T3."country_id")
GROUP BY "aggregation_regress_publisher"."id",
  (SELECT V0."id"
   FROM "aggregation_regress_book" V0
   WHERE V0."id" IN
       (SELECT U0."id"
        FROM "aggregation_regress_book" U0
        WHERE U0."name" = ("aggregation_regress_country"."name")
        ORDER BY U0."name" ASC)
   ORDER BY V0."name" ASC
   LIMIT 1)

Full query 3.0:

SELECT "aggregation_publisher"."id",
       "aggregation_publisher"."name",
       "aggregation_publisher"."num_awards",
       "aggregation_publisher"."duration",
       "aggregation_publisher"."country_id",
       CASE
           WHEN "aggregation_publisher"."num_awards" >= 2 THEN
                  (SELECT V0."id"
                   FROM "aggregation_book" V0
                   WHERE V0."id" IN
                       (SELECT U0."id"
                        FROM "aggregation_book" U0
                        WHERE U0."name" = "aggregation_country"."name")
                   LIMIT 1)
           WHEN "aggregation_publisher"."num_awards" < 0 THEN COUNT(T3."id")
           ELSE NULL
       END AS "total_books"
FROM "aggregation_publisher"
INNER JOIN "aggregation_country" ON ("aggregation_publisher"."country_id" = "aggregation_country"."id")
LEFT OUTER JOIN "aggregation_publisher" T3 ON ("aggregation_country"."id" = T3."country_id")
GROUP BY "aggregation_publisher"."id"

Change History (5)

comment:1 by Igor Pejic, 3 years ago

Component: UncategorizedDatabase layer (models, ORM)

comment:2 by Igor Pejic, 3 years ago

Cc: Igor Pejic added

comment:3 by Igor Pejic, 3 years ago

Type: UncategorizedBug

comment:4 by Simon Charette, 3 years ago

Owner: changed from nobody to Simon Charette
Status: newassigned
Summary: Double OuterRef in Subquery with Case broken in 3.XQueryset annotation mixing aggregate and subquery doesn't GROUP BY outer column references.
Triage Stage: UnreviewedAccepted

That looks like another case of #31094 but generalized for expressions that mix aggregates and subqueries.

Case.get_group_by_cols() defers to BaseExpression.get_group_by_cols() when it has cases and the latter immediately returns self when it contains aggregates.

Igor, could you include the SQL that uses to be generated and what the new one looks like?

comment:5 by Igor Pejic, 3 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top