Opened 5 years ago

Closed 5 years ago

Last modified 23 months ago

#31377 closed Bug (fixed)

Django 3.0: "GROUP BY" clauses error with tricky field annotation

Reported by: Holovashchenko Vadym Owned by: Hasan Ramezani
Component: Database layer (models, ORM) Version: 3.0
Severity: Release blocker Keywords: group by, postgres, annotate
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Let's pretend that we have next model structure with next model's relations:

class A(models.Model):
    bs = models.ManyToManyField('B',
                                related_name="a",
                                through="AB")


class B(models.Model):
    pass


class AB(models.Model):
    a = models.ForeignKey(A, on_delete=models.CASCADE, related_name="ab_a")
    b = models.ForeignKey(B, on_delete=models.CASCADE, related_name="ab_b")

    status = models.IntegerField()


class C(models.Model):
    a = models.ForeignKey(
        A,
        null=True,
        blank=True,
        on_delete=models.SET_NULL,
        related_name="c",
        verbose_name=_("a")
    )
    status = models.IntegerField()

Let's try to evaluate next query

ab_query = AB.objects.filter(a=OuterRef("pk"), b=1)
filter_conditions = Q(pk=1) | Q(ab_a__b=1)

query = A.objects.\
    filter(filter_conditions).\
    annotate(
        status=Subquery(ab_query.values("status")),
        c_count=Count("c"),
)

answer = query.values("status").annotate(total_count=Count("status"))
print(answer.query)
print(answer)

On Django 3.0.4 we have an error

django.db.utils.ProgrammingError: column reference "status" is ambiguous

and query is next:

SELECT (SELECT U0."status" FROM "test_app_ab" U0 WHERE (U0."a_id" = "test_app_a"."id" AND U0."b_id" = 1)) AS "status", COUNT((SELECT U0."status" FROM "test_app_ab" U0 WHERE (U0."a_id" = "test_app_a"."id" AND U0."b_id" = 1))) AS "total_count" FROM "test_app_a" LEFT OUTER JOIN "test_app_ab" ON ("test_app_a"."id" = "test_app_ab"."a_id") LEFT OUTER JOIN "test_app_c" ON ("test_app_a"."id" = "test_app_c"."a_id") WHERE ("test_app_a"."id" = 1 OR "test_app_ab"."b_id" = 1) GROUP BY "status"

However, Django 2.2.11 processed this query properly with the next query:

SELECT (SELECT U0."status" FROM "test_app_ab" U0 WHERE (U0."a_id" = ("test_app_a"."id") AND U0."b_id" = 1)) AS "status", COUNT((SELECT U0."status" FROM "test_app_ab" U0 WHERE (U0."a_id" = ("test_app_a"."id") AND U0."b_id" = 1))) AS "total_count" FROM "test_app_a" LEFT OUTER JOIN "test_app_ab" ON ("test_app_a"."id" = "test_app_ab"."a_id") LEFT OUTER JOIN "test_app_c" ON ("test_app_a"."id" = "test_app_c"."a_id") WHERE ("test_app_a"."id" = 1 OR "test_app_ab"."b_id" = 1) GROUP BY (SELECT U0."status" FROM "test_app_ab" U0 WHERE (U0."a_id" = ("test_app_a"."id") AND U0."b_id" = 1))

so, the difference in "GROUP BY" clauses
(as DB provider uses "django.db.backends.postgresql", postgresql 11)

Change History (9)

comment:1 by Simon Charette, 5 years ago

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

This is due to a collision of AB.status and the status annotation.

The easiest way to solve this issue is to disable group by alias when a collision is detected with involved table columns. This can be easily worked around by avoiding to use an annotation name that conflicts with involved table column names.

comment:2 by Hasan Ramezani, 5 years ago

Owner: changed from nobody to Hasan Ramezani
Status: newassigned

@Simon I think we have the check for collision in annotation alias and model fields .
How can we find the involved tables columns?
Thanks

comment:3 by Simon Charette, 5 years ago

Hasan this is another kind of collision, these fields are not selected and part of join tables so they won't be part of names.

We can't change the behavior at the annotate() level as it would be backward incompatible and require extra checks every time an additional table is joined.

What needs to be adjust is sql.Query.set_group_by to set alias=None if alias is not None and alias in {... set of all column names of tables in alias_map ...}.

https://github.com/django/django/blob/fc0fa72ff4cdbf5861a366e31cb8bbacd44da22d/django/db/models/sql/query.py#L1943-L1945

Version 0, edited 5 years ago by Simon Charette (next)

comment:4 by Hasan Ramezani, 5 years ago

Has patch: set

Simon, I've created a patch to fix this problem.
I think the collision, in this case, is between C.status and the status annotation. if we remove c_count=Count("c") from the query we won't have the error. So, I created a Book1 test model in my patch to simulate the query.

PR

Last edited 5 years ago by Mariusz Felisiak (previous) (diff)

comment:5 by Simon Charette, 5 years ago

Related to #28078 and #28072.

comment:6 by Mariusz Felisiak, 5 years ago

Triage Stage: AcceptedReady for checkin

comment:7 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

Resolution: fixed
Status: assignedclosed

In 10866a10:

Fixed #31377 -- Disabled grouping by aliases on QuerySet.values()/values_list() when they collide with field names.

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Thanks Holovashchenko Vadym for the report.

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 5 years ago

In 72652bcb:

[3.0.x] Fixed #31377 -- Disabled grouping by aliases on QuerySet.values()/values_list() when they collide with field names.

Regression in fb3f034f1c63160c0ff13c609acd01c18be12f80.

Thanks Holovashchenko Vadym for the report.

Backport of 10866a10fe9f0ad3ffdf6f93823aaf4716e6f27c from master

comment:9 by Mariusz Felisiak <felisiak.mariusz@…>, 23 months ago

In dd68af62:

Fixed #34176 -- Fixed grouping by ambiguous aliases.

Regression in b7b28c7c189615543218e81319473888bc46d831.

Refs #31377.

Thanks Shai Berger for the report and reviews.

test_aggregation_subquery_annotation_values_collision() has been
updated as queries that are explicitly grouped by a subquery should
always be grouped by it and not its outer columns even if its alias
collides with referenced table columns. This was not possible to
accomplish at the time 10866a10 landed because we didn't have compiler
level handling of colliding aliases.

Note: See TracTickets for help on using tickets.
Back to Top