Opened 11 months ago

Closed 11 months ago

Last modified 11 months ago

#34748 closed Bug (fixed)

__in lookup crashes with a subquery containing an unused annotation that uses explicit grouping.

Reported by: Toan Vuong Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 4.2
Severity: Release blocker Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Attached is a small Django project to demonstrate the issue.

Tested on:
Django 4.2.2
OS X 13.4.1
Python 3.9.16

For the Oracle backend:
cx-Oracle 8.3.0 with instantclient 19.8

For the Postgres backend:
psycopg/psycopg-binary 3.1.9

The sample query is in test.py and corresponding models are included in the attached tarball. A snippet of the query is below:

inner_qs = (
    Comment.objects.annotate(question_id=F('choice__question__id'))
        .values('question_id').annotate(cnt=Count('*'))
        .values_list('question_id')
)
outer_qs = Question.objects.filter(id__in=inner_qs).all()
print(outer_qs)


In postgres, this generates a query like so (Also tested and same issue on Oracle):

SELECT "polls_question"."id", "polls_question"."question_text", "polls_question"."pub_date" FROM "polls_question" WHERE "polls_question"."id" IN (SELECT U1."question_id" AS "question_id" FROM "polls_comment" U0 INNER JOIN "polls_choice" U1 ON (U0."choice_id" = U1."id") GROUP BY "polls_choice"."question_id") LIMIT 21

The problem is the GROUP BY clause is not referencing the alias U1. Instead, it's referencing the table name. This doesn't seem to happen on 3.x, and I believe it's because in 4.x, for this scenario, the group by clause is represented as a reference (Ref) to the column in the select subquery. Ref implement a no-op on relabeled_clone, relying on something else to modify the Col. But this doesn't seem correct because relabeled_clone is not an in-place change -- it returns a new object. Perhaps it should just do super().relabeled_clone() instead of a no-op?

I understand this example is a little bit convoluted, since the final values_list overrides the previous values() and count aggregation calls, making those useless, but I think this still seems like a bug. I also don't have an example, but suspect Ref::resolve_expression probably has the same issue where the column which Ref references may be resolved correctly, but the referenced stored by Ref won't get resolved.

Attachments (1)

mysite.tar.gz (9.5 KB ) - added by Toan Vuong 11 months ago.
Sample Django project

Download all attachments as: .zip

Change History (9)

by Toan Vuong, 11 months ago

Attachment: mysite.tar.gz added

Sample Django project

comment:1 by Simon Charette, 11 months ago

Given Ref.relabeled_clone and .get_group_by_cols are not covered by tests I suspect this might actually be an edge case we never tested for.

Could you possibly try to bisect which commit introduced the regression if it's a regression in 4.2 it would qualify as a release blocker and we could possibly squeeze a corrective for it before 4.2.4 is released on August 1st.

comment:2 by Mariusz Felisiak, 11 months ago

Cc: Simon Charette added
Component: UncategorizedDatabase layer (models, ORM)
Severity: NormalRelease blocker
Summary: Ref expressions should implement relabeled_clone (and probably resolve_expression, too?)__in lookup crashes with a subquery containing an unused annotation that uses explicit grouping.
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Thanks for the report! Regression in dd68af62b2b27ece50d434f6a351877212e15c3f.

I was also able to reproduce this issue without the second values_list():

>>> inner_qs = (
    Comment.objects.annotate(question_id=F('choice__question__id'))
    .values('question_id')
    .annotate(cnt=Count('*'))
)
>>> outer_qs = Question.objects.filter(id__in=inner_qs).all()
>>> print(outer_qs.query)
SELECT "ticket_34738_question"."id", "ticket_34738_question"."question_text", "ticket_34738_question"."pub_date" FROM "ticket_34738_question" WHERE "ticket_34738_question"."id" IN (SELECT U1."question_id" AS "question_id", COUNT(*) AS "cnt" FROM "ticket_34738_comment" U0 INNER JOIN "ticket_34738_choice" U1 ON (U0."choice_id" = U1."id") GROUP BY "ticket_34738_choice"."question_id")

comment:3 by Simon Charette, 11 months ago

Owner: changed from nobody to Simon Charette
Status: newassigned

comment:4 by Simon Charette, 11 months ago

Has patch: set

comment:5 by Toan Vuong, 11 months ago

Thank you!

comment:6 by Mariusz Felisiak, 11 months ago

Triage Stage: AcceptedReady for checkin

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 11 months ago

Resolution: fixed
Status: assignedclosed

In 4087367:

Fixed #34748 -- Fixed queryset crash when grouping by a reference in a subquery.

Regression in dd68af62b2b27ece50d434f6a351877212e15c3f.

Thanks Toan Vuong for the report.

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 11 months ago

In 739da731:

[4.2.x] Fixed #34748 -- Fixed queryset crash when grouping by a reference in a subquery.

Regression in dd68af62b2b27ece50d434f6a351877212e15c3f.

Thanks Toan Vuong for the report.

Backport of 4087367ba869be9cf305dac39a8887d4aa4041d2 from main

Note: See TracTickets for help on using tickets.
Back to Top