Opened 20 months ago

Closed 20 months ago

Last modified 20 months ago

#34748 closed Bug (fixed)

__in lookup crashes with a subquery containing an unused annotation that uses explicit grouping.

Reported by: Toan Vuong Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 4.2
Severity: Release blocker Keywords:
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Attached is a small Django project to demonstrate the issue.

Tested on:
Django 4.2.2
OS X 13.4.1
Python 3.9.16

For the Oracle backend:
cx-Oracle 8.3.0 with instantclient 19.8

For the Postgres backend:
psycopg/psycopg-binary 3.1.9

The sample query is in test.py and corresponding models are included in the attached tarball. A snippet of the query is below:

inner_qs = (
    Comment.objects.annotate(question_id=F('choice__question__id'))
        .values('question_id').annotate(cnt=Count('*'))
        .values_list('question_id')
)
outer_qs = Question.objects.filter(id__in=inner_qs).all()
print(outer_qs)


In postgres, this generates a query like so (Also tested and same issue on Oracle):

SELECT "polls_question"."id", "polls_question"."question_text", "polls_question"."pub_date" FROM "polls_question" WHERE "polls_question"."id" IN (SELECT U1."question_id" AS "question_id" FROM "polls_comment" U0 INNER JOIN "polls_choice" U1 ON (U0."choice_id" = U1."id") GROUP BY "polls_choice"."question_id") LIMIT 21

The problem is the GROUP BY clause is not referencing the alias U1. Instead, it's referencing the table name. This doesn't seem to happen on 3.x, and I believe it's because in 4.x, for this scenario, the group by clause is represented as a reference (Ref) to the column in the select subquery. Ref implement a no-op on relabeled_clone, relying on something else to modify the Col. But this doesn't seem correct because relabeled_clone is not an in-place change -- it returns a new object. Perhaps it should just do super().relabeled_clone() instead of a no-op?

I understand this example is a little bit convoluted, since the final values_list overrides the previous values() and count aggregation calls, making those useless, but I think this still seems like a bug. I also don't have an example, but suspect Ref::resolve_expression probably has the same issue where the column which Ref references may be resolved correctly, but the referenced stored by Ref won't get resolved.

Change History (9)

by Toan Vuong, 20 months ago

Attachment: mysite.tar.gz added

Sample Django project

comment:1 by Simon Charette, 20 months ago

Given Ref.relabeled_clone and .get_group_by_cols are not covered by tests I suspect this might actually be an edge case we never tested for.

Could you possibly try to bisect which commit introduced the regression if it's a regression in 4.2 it would qualify as a release blocker and we could possibly squeeze a corrective for it before 4.2.4 is released on August 1st.

comment:2 by Mariusz Felisiak, 20 months ago

Cc: Simon Charette added
Component: UncategorizedDatabase layer (models, ORM)
Severity: NormalRelease blocker
Summary: Ref expressions should implement relabeled_clone (and probably resolve_expression, too?)__in lookup crashes with a subquery containing an unused annotation that uses explicit grouping.
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

Thanks for the report! Regression in dd68af62b2b27ece50d434f6a351877212e15c3f.

I was also able to reproduce this issue without the second values_list():

>>> inner_qs = (
    Comment.objects.annotate(question_id=F('choice__question__id'))
    .values('question_id')
    .annotate(cnt=Count('*'))
)
>>> outer_qs = Question.objects.filter(id__in=inner_qs).all()
>>> print(outer_qs.query)
SELECT "ticket_34738_question"."id", "ticket_34738_question"."question_text", "ticket_34738_question"."pub_date" FROM "ticket_34738_question" WHERE "ticket_34738_question"."id" IN (SELECT U1."question_id" AS "question_id", COUNT(*) AS "cnt" FROM "ticket_34738_comment" U0 INNER JOIN "ticket_34738_choice" U1 ON (U0."choice_id" = U1."id") GROUP BY "ticket_34738_choice"."question_id")

comment:3 by Simon Charette, 20 months ago

Owner: changed from nobody to Simon Charette
Status: newassigned

comment:4 by Simon Charette, 20 months ago

Has patch: set

comment:5 by Toan Vuong, 20 months ago

Thank you!

comment:6 by Mariusz Felisiak, 20 months ago

Triage Stage: AcceptedReady for checkin

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 20 months ago

Resolution: fixed
Status: assignedclosed

In 4087367:

Fixed #34748 -- Fixed queryset crash when grouping by a reference in a subquery.

Regression in dd68af62b2b27ece50d434f6a351877212e15c3f.

Thanks Toan Vuong for the report.

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 20 months ago

In 739da731:

[4.2.x] Fixed #34748 -- Fixed queryset crash when grouping by a reference in a subquery.

Regression in dd68af62b2b27ece50d434f6a351877212e15c3f.

Thanks Toan Vuong for the report.

Backport of 4087367ba869be9cf305dac39a8887d4aa4041d2 from main

Note: See TracTickets for help on using tickets.
Back to Top