Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#32478 closed Bug (fixed)

Queryset annotation mixing aggregate and subquery doesn't GROUP BY outer column references.

Reported by: Igor Pejic Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 3.0
Severity: Normal Keywords: outerref, subquery
Cc: Igor Pejic Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Igor Pejic)

After trying to migrate from 2.2. to 3.0, we experience a problem with the translation of queries in PostgreSQL which used to work in 2.2.

A breaking test is shown in: https://github.com/igorpejic/django/pull/1/files

From investigation this is related to: https://code.djangoproject.com/ticket/31094
and it seems the regression was introduced in: https://github.com/django/django/commit/fb3f034f1c63160c0ff13c609acd01c18be12f80

It seems that the combination of the "double OuterRef" with the "Case" logic is causing the problem.

Query looks like:

        books_with_same_name_as_country = Book.objects.filter(
            id__in=Subquery(
                Book.objects.filter(
                    name=OuterRef(OuterRef('country__name')),
                ).values('id')
            )
        ).values('id')[:1]
        books_breakdown = Publisher.objects.annotate(total_books=Case(
            When(
                num_awards__gte=2,
                then=Subquery(books_with_same_name_as_country, IntegerField())
            ),
            When(
                num_awards__lt=0,
                then=Count('country__publishers')
            ),
        ))

Stack trace:

======================================================================
ERROR: test_group_by_subquery_annotation_with_conditional (aggregation.tests.AggregateTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.GroupingError: subquery uses ungrouped column "aggregation_country.name" from outer query
LINE 1: ..."id" FROM "aggregation_book" U0 WHERE U0."name" = "aggregati...
                                                             ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/igor/django-repo/django/test/testcases.py", line 1229, in skip_wrapper
    return test_func(*args, **kwargs)
  File "/home/igor/django-repo/tests/aggregation/tests.py", line 1299, in test_group_by_subquery_annotation_with_conditional
    self.assertEqual(books_breakdown.get(id=self.p1.id).total_books, 1)
  File "/home/igor/django-repo/django/db/models/query.py", line 411, in get
    num = len(clone)
  File "/home/igor/django-repo/django/db/models/query.py", line 258, in __len__
    self._fetch_all()
  File "/home/igor/django-repo/django/db/models/query.py", line 1261, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/igor/django-repo/django/db/models/query.py", line 57, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/home/igor/django-repo/django/db/models/sql/compiler.py", line 1154, in execute_sql
    cursor.execute(sql, params)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 68, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 77, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
  File "/home/igor/django-repo/django/db/utils.py", line 90, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/home/igor/django-repo/django/db/backends/utils.py", line 86, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: subquery uses ungrouped column "aggregation_country.name" from outer query
LINE 1: ..."id" FROM "aggregation_book" U0 WHERE U0."name" = "aggregati...
                                                             ^

Full query 2.2:

SELECT "aggregation_regress_publisher"."id",
       "aggregation_regress_publisher"."name",
       "aggregation_regress_publisher"."num_awards",
       "aggregation_regress_publisher"."country_id",
       CASE
           WHEN "aggregation_regress_publisher"."num_awards" >= 2 THEN
                  (SELECT V0."id"
                   FROM "aggregation_regress_book" V0
                   WHERE V0."id" IN
                       (SELECT U0."id"
                        FROM "aggregation_regress_book" U0
                        WHERE U0."name" = ("aggregation_regress_country"."name")
                        ORDER BY U0."name" ASC)
                   ORDER BY V0."name" ASC
                   LIMIT 1)
           WHEN "aggregation_regress_publisher"."num_awards" < 0 THEN COUNT(T3."id")
           ELSE NULL
       END AS "total_books"
FROM "aggregation_regress_publisher"
INNER JOIN "aggregation_regress_country" ON ("aggregation_regress_publisher"."country_id" = "aggregation_regress_country"."id")
LEFT OUTER JOIN "aggregation_regress_publisher" T3 ON ("aggregation_regress_country"."id" = T3."country_id")
GROUP BY "aggregation_regress_publisher"."id",
  (SELECT V0."id"
   FROM "aggregation_regress_book" V0
   WHERE V0."id" IN
       (SELECT U0."id"
        FROM "aggregation_regress_book" U0
        WHERE U0."name" = ("aggregation_regress_country"."name")
        ORDER BY U0."name" ASC)
   ORDER BY V0."name" ASC
   LIMIT 1)

Full query 3.0:

SELECT "aggregation_publisher"."id",
       "aggregation_publisher"."name",
       "aggregation_publisher"."num_awards",
       "aggregation_publisher"."duration",
       "aggregation_publisher"."country_id",
       CASE
           WHEN "aggregation_publisher"."num_awards" >= 2 THEN
                  (SELECT V0."id"
                   FROM "aggregation_book" V0
                   WHERE V0."id" IN
                       (SELECT U0."id"
                        FROM "aggregation_book" U0
                        WHERE U0."name" = "aggregation_country"."name")
                   LIMIT 1)
           WHEN "aggregation_publisher"."num_awards" < 0 THEN COUNT(T3."id")
           ELSE NULL
       END AS "total_books"
FROM "aggregation_publisher"
INNER JOIN "aggregation_country" ON ("aggregation_publisher"."country_id" = "aggregation_country"."id")
LEFT OUTER JOIN "aggregation_publisher" T3 ON ("aggregation_country"."id" = T3."country_id")
GROUP BY "aggregation_publisher"."id"

Change History (13)

comment:1 Changed 3 years ago by Igor Pejic

Component: UncategorizedDatabase layer (models, ORM)

comment:2 Changed 3 years ago by Igor Pejic

Cc: Igor Pejic added

comment:3 Changed 3 years ago by Igor Pejic

Type: UncategorizedBug

comment:4 Changed 3 years ago by Simon Charette

Owner: changed from nobody to Simon Charette
Status: newassigned
Summary: Double OuterRef in Subquery with Case broken in 3.XQueryset annotation mixing aggregate and subquery doesn't GROUP BY outer column references.
Triage Stage: UnreviewedAccepted

That looks like another case of #31094 but generalized for expressions that mix aggregates and subqueries.

Case.get_group_by_cols() defers to BaseExpression.get_group_by_cols() when it has cases and the latter immediately returns self when it contains aggregates.

Igor, could you include the SQL that uses to be generated and what the new one looks like?

comment:5 Changed 3 years ago by Igor Pejic

Description: modified (diff)

comment:6 Changed 3 years ago by Igor Pejic

Thanks. I edited the main description with the query in 3.0 and 2.2.
The main difference is in the last argument of the group by.

comment:7 Changed 3 years ago by Mariusz Felisiak

I confirmed that it's a regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

comment:8 Changed 3 years ago by Simon Charette

Has patch: set

Igor, could you confirm your issue is resolved by https://github.com/django/django/pull/14039.

The underlying problem was that nested external column references were not recursively collected so the problem detailed in #31094 could manifest itself when mixing aggregation with a nested subqueries because Query.get_external_cols() would not account for possibly nested subqueries with external column references.

comment:9 Changed 3 years ago by Igor Pejic

I can confirm the fix works for our repository when using it with 3.0.12

Thank you for this fix and all other work Simon.

comment:10 Changed 3 years ago by Mariusz Felisiak

Triage Stage: AcceptedReady for checkin

comment:11 Changed 3 years ago by Mariusz Felisiak

Thanks Igor.

comment:12 Changed 3 years ago by Mariusz Felisiak <felisiak.mariusz@…>

Resolution: fixed
Status: assignedclosed

In 277eea8f:

Fixed #32478 -- Included nested columns referenced by subqueries in GROUP BY on aggregations.

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Refs #31094, #31150.

Thanks Igor Pejic for the report.

comment:13 Changed 3 years ago by Mariusz Felisiak <felisiak.mariusz@…>

In 7a6ca01:

[3.2.x] Fixed #32478 -- Included nested columns referenced by subqueries in GROUP BY on aggregations.

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Refs #31094, #31150.

Thanks Igor Pejic for the report.

Backport of 277eea8fcced7f04f3800617f189beb349a3212e from master

Note: See TracTickets for help on using tickets.
Back to Top