Opened 10 months ago

Closed 9 months ago

Last modified 9 months ago

#34750 closed Bug (fixed)

Fixed QuerySet.count() on querysets grouped by unused multi-valued annotations.

Reported by: Toan Vuong Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 4.2
Severity: Release blocker Keywords:
Cc: Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I believe this is related to the work in https://code.djangoproject.com/ticket/28477 (and follow-up issues/changes due to that ticket).

Tested on:
Django 4.2.2
OS X 13.4.1
Python 3.9.16

For the Oracle backend:
cx-Oracle 8.3.0 with instantclient 19.8

For the Postgres backend:
psycopg/psycopg-binary 3.1.9

Attached is a sample project, and the relevant query is below:

def populate_data():
    q1 = Question(question_text='t1')
    q1.save()

    for i in range(10):
        c = Choice(question=q1, choice_text='c1', votes=i)
        c.save()

# Need to populate the data only once, so comment it out on subsequent runs
populate_data()

qs = (
        Question.objects.values('id', 'question_text')
            .annotate(mysum=Sum('choice__votes'))
            .annotate(choice__votes_threshold=Case(
                When(choice__votes__gt=1, then=Value(1000)),
                default=Value(-1)))
)
print(qs.count() == len(qs))

When issuing the count query (qs.count()), Django generates this:

SELECT COUNT(*) FROM (SELECT "polls_question"."id" AS "col1", "polls_question"."question_text" AS "col2" FROM "polls_question" LEFT OUTER JOIN "polls_choice" ON ("polls_question"."id" = "polls_choice"."question_id") GROUP BY 1

I've chased it down to this optimization. This count would return 1 because it's just a simple join.

However, the actual query is this:

SELECT "polls_question"."id", "polls_question"."question_text", SUM("polls_choice"."votes") AS "mysum", CASE WHEN "polls_choice"."votes" > 1 THEN 1000 ELSE  -1 END AS "choice__votes_threshold" FROM "polls_question" LEFT OUTER JOIN "polls_choice" ON ("polls_question"."id" = "polls_choice"."question_id") GROUP BY "polls_question"."id", 4

Due to the group by and the varying choice__votes_threshold, there should be 2 rows in the result set.

This _did_ used to work in 3.2.18, and I didn't dig too much into it but I suspect the optimization was introduced after that version, hence why it worked.

Attachments (1)

mysite.tar.gz (10.4 KB ) - added by Toan Vuong 10 months ago.
Sample project

Download all attachments as: .zip

Change History (9)

by Toan Vuong, 10 months ago

Attachment: mysite.tar.gz added

Sample project

comment:1 by Mariusz Felisiak, 9 months ago

Cc: Simon Charette added
Component: UncategorizedDatabase layer (models, ORM)
Severity: NormalRelease blocker
Summary: Group by removal optimization generates in correct count queryFixed QuerySet.count() on querysets grouped by unused multi-valued annotations.
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Thanks for the report!

Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.

comment:2 by Simon Charette, 9 months ago

Owner: changed from nobody to Simon Charette
Status: newassigned

comment:3 by Simon Charette, 9 months ago

I don't think I'll be able to make the cut for 4.2.4 but I have an idea of our to address this issue so I self assigned.

in reply to:  3 comment:4 by Mariusz Felisiak, 9 months ago

Replying to Simon Charette:

I don't think I'll be able to make the cut for 4.2.4 but I have an idea of our to address this issue so I self assigned.

We can postpone 4.2.4 for a few days.

comment:5 by Mariusz Felisiak, 9 months ago

I've prepared a regression test:

  • tests/aggregation/tests.py

    diff --git a/tests/aggregation/tests.py b/tests/aggregation/tests.py
    index 9f2a7c8841..8bdeaca97d 100644
    a b class AggregateAnnotationPruningTests(TestCase):  
    21652165        self.assertEqual(sql.count("select"), 2, "Subquery wrapping required")
    21662166        self.assertNotIn("authors_count", sql)
    21672167
     2168    def test_unused_aliased_aggregate_and_annotation_reverse_fk(self):
     2169        Book.objects.create(
     2170            name="b3",
     2171            publisher=self.p2,
     2172            pages=1000,
     2173            rating=4.2,
     2174            price=50,
     2175            contact=self.a2,
     2176            pubdate=datetime.date.today(),
     2177        )
     2178        qs = Publisher.objects.annotate(
     2179            total_pages=Sum("book__pages"),
     2180            good_book=Case(
     2181                When(book__rating__gt=4.0, then=Value(True)),
     2182                default=Value(False),
     2183            ),
     2184        )
     2185        self.assertEqual(qs.count(), 3)
     2186
    21682187    def test_non_aggregate_annotation_pruned(self):
    21692188        with CaptureQueriesContext(connection) as ctx:
    21702189            Book.objects.annotate(

comment:6 by Mariusz Felisiak, 9 months ago

Has patch: set

comment:7 by GitHub <noreply@…>, 9 months ago

Resolution: fixed
Status: assignedclosed

In c9b9a52:

Fixed #34750 -- Fixed QuerySet.count() when grouping by unused multi-valued annotations.

Thanks Toan Vuong for the report.
Thanks Simon Charette for the review.

Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 9 months ago

In 8808d9d:

[4.2.x] Fixed #34750 -- Fixed QuerySet.count() when grouping by unused multi-valued annotations.

Thanks Toan Vuong for the report.
Thanks Simon Charette for the review.

Regression in 59bea9efd2768102fc9d3aedda469502c218e9b7.
Backport of c9b9a52edc66be117c6e5b5214fa788a4d5db7a8 from main

Note: See TracTickets for help on using tickets.
Back to Top