Opened 4 years ago

Last modified 4 years ago

#32152 closed Bug

Groupby after aggregation and subquery produces subtly wrong results — at Initial Version

Reported by: Christian Klus Owned by: nobody
Component: Database layer (models, ORM) Version: 3.0
Severity: Release blocker Keywords:
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Starting in django 3.0.7, specifically after patch #31566 I noticed some of my more complex queries returning incorrect results. I think I've narrowed it down to a simpler test case:

Example query:

Book.objects.all().annotate(
    pub_year=TruncYear('pubdate')
).order_by().values('pub_year').annotate(
    total_pages=Sum('pages'),
    top_rating=Subquery(
        Book.objects.filter(
            pubdate__year=OuterRef('pub_year')
        ).order_by('rating').values('rating')[:1]
    )
).values('pub_year', 'total_pages', 'top_rating')

Generated SQL on 3.0.6:

SELECT
  django_date_trunc('year', "aggregation_regress_book"."pubdate") AS "pub_year",
  SUM("aggregation_regress_book"."pages") AS "total_pages",
  (
    SELECT U0."rating"
    FROM "aggregation_regress_book" U0
    WHERE
      django_date_extract('year', U0."pubdate") = django_date_trunc('year', "aggregation_regress_book"."pubdate")
    ORDER BY U0."rating" ASC LIMIT 1
  ) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
  django_date_trunc('year', "aggregation_regress_book"."pubdate"),
  "top_rating"

Generated SQL on current master:

SELECT
  django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL, NULL) AS "pub_year",
  SUM("aggregation_regress_book"."pages") AS "total_pages",
  (
    SELECT U0."rating"
    FROM "aggregation_regress_book" U0
    WHERE
      django_date_extract('year', U0."pubdate") = django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL, NULL)
    ORDER BY U0."rating" ASC LIMIT 1
  ) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
  django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL, NULL),
  (
    SELECT U0."rating"
    FROM "aggregation_regress_book" U0
    WHERE
      django_date_extract('year', U0."pubdate") = django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL, NULL)
    ORDER BY U0."rating" ASC LIMIT 1
  ),
  "aggregation_regress_book"."pubdate"

I see two separate issues here:

  • "aggregation_regress_book"."pubdate" is being added to the group by clause incorrectly (this issue probably predates the patch mentioned above)
  • Even though the subquery is in the select statement, the alias is not being used and instead the subquery is reevaluated. This nearly doubles the cost of one of my queries that is experiencing this problem.

I don't know much about the ORM internals, but here is my naive patch for the second issue:

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index ee98984826..6ea287d6cb 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -2220,7 +2220,7 @@ class Query(BaseExpression):
             # the selected fields anymore.
             group_by = []
             for expr in self.group_by:
-                if isinstance(expr, Ref) and expr.refs not in field_names:
+                if isinstance(expr, Ref) and expr.refs not in field_names + annotation_names:
                     expr = self.annotations[expr.refs]
                 group_by.append(expr)
             self.group_by = tuple(group_by)

I'd appreciate anyone with deeper knowlege of the ORM to chime in and let me know if I'm on the right track. Tests are passing locally.

The resulting query on master:

SELECT
  django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL, NULL) AS "pub_year",
  SUM("aggregation_regress_book"."pages") AS "total_pages",
  (
    SELECT U0."rating"
    FROM "aggregation_regress_book" U0
    WHERE
      django_date_extract('year', U0."pubdate") = django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL, NULL)
    ORDER BY U0."rating" ASC LIMIT 1
  ) AS "top_rating"
FROM "aggregation_regress_book"
GROUP BY
  django_date_trunc('year', "aggregation_regress_book"."pubdate", NULL, NULL),
  "top_rating"

Change History (0)

Note: See TracTickets for help on using tickets.
Back to Top