Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#32478 closed Bug (fixed)

Queryset annotation mixing aggregate and subquery doesn't GROUP BY outer column references.

Reported by: Igor Pejic Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 3.0
Severity: Normal Keywords: outerref, subquery
Cc: Igor Pejic Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Igor Pejic)

After trying to migrate from 2.2. to 3.0, we experience a problem with the translation of queries in PostgreSQL which used to work in 2.2.

A breaking test is shown in: https://github.com/igorpejic/django/pull/1/files

From investigation this is related to: https://code.djangoproject.com/ticket/31094
and it seems the regression was introduced in: https://github.com/django/django/commit/fb3f034f1c63160c0ff13c609acd01c18be12f80

It seems that the combination of the "double OuterRef" with the "Case" logic is causing the problem.

Query looks like:

        books_with_same_name_as_country = Book.objects.filter(
            id__in=Subquery(
                Book.objects.filter(
                    name=OuterRef(OuterRef('country__name')),
                ).values('id')
            )
        ).values('id')[:1]
        books_breakdown = Publisher.objects.annotate(total_books=Case(
            When(
                num_awards__gte=2,
                then=Subquery(books_with_same_name_as_country, IntegerField())
            ),
            When(
                num_awards__lt=0,
                then=Count('country__publishers')
            ),
        ))

Stack trace:

======================================================================
ERROR: test_group_by_subquery_annotation_with_conditional (aggregation.tests.AggregateTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.GroupingError: subquery uses ungrouped column "aggregation_country.name" from outer query
LINE 1: ..."id" FROM "aggregation_book" U0 WHERE U0."name" = "aggregati...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/igor/django-repo/django/test/testcases.py", line 1229, in skip_wrapper
    return test_func(*args, **kwargs)
  File "/home/igor/django-repo/tests/aggregation/tests.py", line 1299, in test_group_by_subquery_annotation_with_conditional
    self.assertEqual(books_breakdown.get(id=self.p1.id).total_books, 1)
  File "/home/igor/django-repo/django/db/models/query.py", line 411, in get
    num = len(clone)
  File "/home/igor/django-repo/django/db/models/query.py", line 258, in __len__
    self._fetch_all()
  File "/home/igor/django-repo/django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/igor/django-repo/django/db/models/query.py", line 57, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/home/igor/django-repo/django/db/models/sql/compiler.py", line 1154, in execute_sql
    cursor.execute(sql, params)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/igor/django-repo/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: subquery uses ungrouped column "aggregation_country.name" from outer query
LINE 1: ..."id" FROM "aggregation_book" U0 WHERE U0."name" = "aggregati...
                                                             ^

Full query 2.2:

SELECT "aggregation_regress_publisher"."id",
       "aggregation_regress_publisher"."name",
       "aggregation_regress_publisher"."num_awards",
       "aggregation_regress_publisher"."country_id",
       CASE
           WHEN "aggregation_regress_publisher"."num_awards" >= 2 THEN
                  (SELECT V0."id"
                   FROM "aggregation_regress_book" V0
                   WHERE V0."id" IN
                       (SELECT U0."id"
                        FROM "aggregation_regress_book" U0
                        WHERE U0."name" = ("aggregation_regress_country"."name")
                        ORDER BY U0."name" ASC)
                   ORDER BY V0."name" ASC
                   LIMIT 1)
           WHEN "aggregation_regress_publisher"."num_awards" < 0 THEN COUNT(T3."id")
           ELSE NULL
       END AS "total_books"
FROM "aggregation_regress_publisher"
INNER JOIN "aggregation_regress_country" ON ("aggregation_regress_publisher"."country_id" = "aggregation_regress_country"."id")
LEFT OUTER JOIN "aggregation_regress_publisher" T3 ON ("aggregation_regress_country"."id" = T3."country_id")
GROUP BY "aggregation_regress_publisher"."id",
  (SELECT V0."id"
   FROM "aggregation_regress_book" V0
   WHERE V0."id" IN
       (SELECT U0."id"
        FROM "aggregation_regress_book" U0
        WHERE U0."name" = ("aggregation_regress_country"."name")
        ORDER BY U0."name" ASC)
   ORDER BY V0."name" ASC
   LIMIT 1)

Full query 3.0:

SELECT "aggregation_publisher"."id",
       "aggregation_publisher"."name",
       "aggregation_publisher"."num_awards",
       "aggregation_publisher"."duration",
       "aggregation_publisher"."country_id",
       CASE
           WHEN "aggregation_publisher"."num_awards" >= 2 THEN
                  (SELECT V0."id"
                   FROM "aggregation_book" V0
                   WHERE V0."id" IN
                       (SELECT U0."id"
                        FROM "aggregation_book" U0
                        WHERE U0."name" = "aggregation_country"."name")
                   LIMIT 1)
           WHEN "aggregation_publisher"."num_awards" < 0 THEN COUNT(T3."id")
           ELSE NULL
       END AS "total_books"
FROM "aggregation_publisher"
INNER JOIN "aggregation_country" ON ("aggregation_publisher"."country_id" = "aggregation_country"."id")
LEFT OUTER JOIN "aggregation_publisher" T3 ON ("aggregation_country"."id" = T3."country_id")
GROUP BY "aggregation_publisher"."id"

Change History (13)

comment:1 by Igor Pejic, 3 years ago

Component: UncategorizedDatabase layer (models, ORM)

comment:2 by Igor Pejic, 3 years ago

Cc: Igor Pejic added

comment:3 by Igor Pejic, 3 years ago

Type: UncategorizedBug

comment:4 by Simon Charette, 3 years ago

Owner: changed from nobody to Simon Charette
Status: newassigned
Summary: Double OuterRef in Subquery with Case broken in 3.XQueryset annotation mixing aggregate and subquery doesn't GROUP BY outer column references.
Triage Stage: UnreviewedAccepted

That looks like another case of #31094 but generalized for expressions that mix aggregates and subqueries.

Case.get_group_by_cols() defers to BaseExpression.get_group_by_cols() when it has cases and the latter immediately returns self when it contains aggregates.

Igor, could you include the SQL that uses to be generated and what the new one looks like?

comment:5 by Igor Pejic, 3 years ago

Description: modified (diff)

comment:6 by Igor Pejic, 3 years ago

Thanks. I edited the main description with the query in 3.0 and 2.2.
The main difference is in the last argument of the group by.

comment:7 by Mariusz Felisiak, 3 years ago

I confirmed that it's a regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

comment:8 by Simon Charette, 3 years ago

Has patch: set

Igor, could you confirm your issue is resolved by https://github.com/django/django/pull/14039.

The underlying problem was that nested external column references were not recursively collected so the problem detailed in #31094 could manifest itself when mixing aggregation with a nested subqueries because Query.get_external_cols() would not account for possibly nested subqueries with external column references.

comment:9 by Igor Pejic, 3 years ago

I can confirm the fix works for our repository when using it with 3.0.12

Thank you for this fix and all other work Simon.

comment:10 by Mariusz Felisiak, 3 years ago

Triage Stage: AcceptedReady for checkin

comment:11 by Mariusz Felisiak, 3 years ago

Thanks Igor.

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

Resolution: fixed
Status: assignedclosed

In 277eea8f:

Fixed #32478 -- Included nested columns referenced by subqueries in GROUP BY on aggregations.

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Refs #31094, #31150.

Thanks Igor Pejic for the report.

comment:13 by Mariusz Felisiak <felisiak.mariusz@…>, 3 years ago

In 7a6ca01:

[3.2.x] Fixed #32478 -- Included nested columns referenced by subqueries in GROUP BY on aggregations.

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Refs #31094, #31150.

Thanks Igor Pejic for the report.

Backport of 277eea8fcced7f04f3800617f189beb349a3212e from master

Note: See TracTickets for help on using tickets.
Back to Top