Opened 17 months ago

Closed 16 months ago

Last modified 14 months ago

#34176 closed Bug (fixed)

Annotation's original field-name can clash with result field name over aggregation

Reported by: Shai Berger Owned by: Simon Charette
Component: Database layer (models, ORM) Version: dev
Severity: Release blocker Keywords:
Cc: Simon Charette, David Sanders, Paolo Melchiorre Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is a bit of a continuation of #34123

With current Django main, if one aggregates over a model, while adding annotations of related fields; and an annotation comes from a field with the same name as a result field, it breaks.

This is easier to explain by example; consider this test, added to the aggregation_regress tests, so it uses their models; Book carries a FK to Publisher, a M2M to Author named authors, and a FK to Author (supposedly picking one of the related authors) named contact.

    def test_aggregate_and_annotate_duplicate_columns(self):
        results = Book.objects.values('isbn').annotate(
            name=F('publisher__name'),
            contact_name=F('contact__name'),
            num_authors=Count('authors'),
        )
        self.assertTrue(results)

The field isbn is not defined as unique, it was chosen in order to make sure there's an aggregation on the main table and not some sort of subquery -- and also, to get the Book model's own name field out of the way.

This blows up on Sqlite:

======================================================================
ERROR: test_aggregate_and_annotate_duplicate_columns (aggregation_regress.tests.AggregationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/django/django/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
  File "/home/django/django/django/db/backends/sqlite3/base.py", line 378, in execute
    return super().execute(query, params)
sqlite3.OperationalError: ambiguous column name: name

and on Postgres

======================================================================
ERROR: test_aggregate_and_annotate_duplicate_columns (aggregation_regress.tests.AggregationTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/django/django/django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.AmbiguousColumn: column reference "name" is ambiguous
LINE 1: ..._id") GROUP BY "aggregation_regress_book"."isbn", "name", "c...
                                                             ^

Note that if we change the name above to something else -- e.g.

            pub_name=F('publisher__name'),

then things seem to be working fine. Conversely, if we pick another related field for the name annotation:

            name=F('publisher__num_awards'),

then things stay broken.

This used to work, and now it doesn't. I've bisected this regression to b7b28c7c189615543218e81319473888bc46d831.

Thanks Matific for letting me work on this.

Change History (9)

comment:1 by Mariusz Felisiak, 17 months ago

Triage Stage: UnreviewedAccepted
Version: 4.1dev

Thanks for the report!

Reproduced at 744a1af7f943106e30d538e6ace55c2c66ccd791.

comment:2 by Simon Charette, 17 months ago

Owner: changed from nobody to Simon Charette
Status: newassigned

Thanks for the report Shai.

I stumbled upon something similar in #34123 when trying to have get_select use the values aliases instead of colN ones for aesthetic reasons.

Changes are required in SQLCompiler.get_select that are similar to the ones in sql.Query.set_group_by(allow_aliases=True).

comment:3 by Simon Charette, 17 months ago

Has patch: set

I didn't know that

Book.objects.values('isbn').annotate(
    name=F('publisher__name'),
)

was allowed but

> Book.objects.values('isbn', name=F('publisher__name'))
ValueError: The annotation 'name' conflicts with a field on the model.

wasn't which seems like a bug in #25871?

Anyway, I submitted this MR for review which avoids a complex refactor of get_select that would have necessitated reference repointing.

This means that group by reference will be disabled for annotations that conflict with a selected column name but I think that's an acceptable compromise given it should only have an incidence on complex annotations that can't be collapsed that are assigned with ambiguous names.

This a compromise we already made in #31377 and has a straightforward workaround for users that must get the group be reference for performance reasons; use an annotation name that doesn't conflict with an already selected column name.

Last edited 17 months ago by Simon Charette (previous) (diff)

comment:4 by Mariusz Felisiak, 17 months ago

Triage Stage: AcceptedReady for checkin

comment:5 by Paolo Melchiorre, 17 months ago

Cc: Paolo Melchiorre added

comment:6 by Mariusz Felisiak, 17 months ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:7 by Mariusz Felisiak, 16 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:8 by Mariusz Felisiak <felisiak.mariusz@…>, 16 months ago

Resolution: fixed
Status: assignedclosed

In dd68af62:

Fixed #34176 -- Fixed grouping by ambiguous aliases.

Regression in b7b28c7c189615543218e81319473888bc46d831.

Refs #31377.

Thanks Shai Berger for the report and reviews.

test_aggregation_subquery_annotation_values_collision() has been
updated as queries that are explicitly grouped by a subquery should
always be grouped by it and not its outer columns even if its alias
collides with referenced table columns. This was not possible to
accomplish at the time 10866a10 landed because we didn't have compiler
level handling of colliding aliases.

comment:9 by Simon Charette <charette.s@…>, 14 months ago

In f91e085:

Refs #34176 -- Adjusted group by position variables naming to follow SQL spec.

This avoids conceptual collisions with the notion of indices.

Note: See TracTickets for help on using tickets.
Back to Top