Opened 4 years ago

Closed 4 years ago

#31880 closed Bug (fixed)

QuerySet.aggregate() mixes annotated fields names.

Reported by: Thodoris Sotiropoulos Owned by: David Wobrock
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords:
Cc: David Wobrock, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have the following model (simplified for opening the issue).

class T(models.Model):
  id = models.AutoField(primary=True)
  foo = models.IntegerField()

and I perform the following query on MySQL (simplified for opening the issue)

T.objects.annotate(anon=F('foo')).aggregate(foo=Max(F('anon')), sum=Sum(F('foo')))

this produces the following SQL query

SELECT MAX(`anon`), SUM(`foo`) FROM (SELECT `table`.`foo` AS `anon` FROM `foo`) subquery

that causes the following exception

django.db.utils.OperationalError: (1054, "Unknown column 'foo' in 'field list'")

I would expect django to generate

SELECT MAX(`anon`), SUM(`anon`) FROM (SELECT `table`.`foo` AS `anon` FROM `table`) subquery

Change History (11)

comment:1 by Mariusz Felisiak, 4 years ago

Summary: Incorrect handling of aggregate's keyword nameQuerySet.aggregate() mixes annotated fields names.
Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

I would expect django to generate

SELECT MAX(`anon`), SUM(`anon`) FROM (SELECT `table`.`foo` AS `anon` FROM `table`) subquery

I would rather expect:

SELECT MAX(`anon`), SUM(`foo`) FROM (SELECT `table`.`foo`, `table`.`foo` AS `anon` FROM `table`) subquery


Nevertheless we should raise a descriptive error or add a column to the subquery. Is there any reason to not use Sum(F('anon'))?

comment:2 by David Wobrock, 4 years ago

Cc: David Wobrock added
Has patch: set
Owner: changed from nobody to David Wobrock
Status: newassigned

Hi,

I tried to tackle this ticket by fixing the actual bug.

The PR

It's surely not that an important bug and one could just use a different alias for the aggregation, but I wanted to get to the bottom.

Aggregations are actually also registering annotations, meaning that the annotation alias is overridden, and the subquery is not aware of the field anymore.
I thought of three ways to solve this:

  • raise an error when an aggregation is overriding the alias of an annotation (basically, forbid the buggy case) -- easy to implement, but I think it's a pity to forbid something that could work.
  • implement the smallest change that will allow this case -- it's the approach implemented in the linked patch.
  • handle differently aggregation annotations and simple annotations -- a large change, which would require quite a rewrite of parts of the aggregation logic and would be a lot of work for such a small edge case I guess.

I hope it makes sense to you and I'm open to changing to any other and better approach :)
David

comment:3 by Jacob Walls, 4 years ago

Duplicate of #31679

comment:4 by Mariusz Felisiak, 4 years ago

Jacob, it's really similar but it's not a duplicate, IMO.

comment:5 by Jacob Walls, 4 years ago

Yeah I can see that better now, sorry. I also checked out the patch and see that it doesn't address #31679 as far as I can tell, so better to have two tickets.

comment:6 by Mariusz Felisiak, 4 years ago

Cc: Simon Charette added

I would expect that the first kwarg (e.g. age_alias=Sum(F('age')) overrides the previous annotation (e.g. age_alias=F('age')) and a FieldError "Cannot compute Avg('age_alias'): 'age_alias' is an aggregate" is raised, like with annotations:

  • Author.objects.annotate(age_alias=F('age')).annotate(age_alias=Sum(F('age'))).annotate(avg_age=Avg(F('age_alias'))),
  • Author.objects.annotate(age_alias=F('age')).annotate(age_alias=Sum(F('age')), avg_age=Avg(F('age_alias'))).

IMO it's not obvious how such querysets should behave. I think we should raise an error if an aggregate() overrides and refs to the same alias.

comment:7 by Mariusz Felisiak, 4 years ago

Has patch: unset

To sum up, I think supporting this edge case isn't worth adding extra complexity, we should raise an error. You can always use different aliases.

comment:8 by David Wobrock, 4 years ago

Has patch: set

Hi, thanks for taking the time diving into the issue.

I agree that the expected behaviour is not obvious.
My reflection why it could be supported was that:

  • it is possible since the aliases are used in two different queries (the outer and the inner) in the case of an aggregation
  • You can always use different aliases. it's true, but with a system that automatically generates QuerySets it might not be easy to "just use different aliases".

However, let's go with an error, I agree that it adds complexity for not a lot of added value. There is a fair chance that the first patch missed edge cases.
I pushed a new patch: https://github.com/django/django/pull/13431

Happy to hear what you think about this one :)

comment:9 by Simon Charette, 4 years ago

I think raising an explicit error at the ORM level is the right call for now. Nothing prevents us from eventually fixing this if we can align on an expected behaviour in the future but users will get a clearer error message in the mean time.

comment:10 by Mariusz Felisiak, 4 years ago

Triage Stage: AcceptedReady for checkin

comment:11 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

Resolution: fixed
Status: assignedclosed

In 058747b:

Fixed #31880 -- Made QuerySet.aggregate() raise FieldError when aggregating over aggregation aliases.

Note: See TracTickets for help on using tickets.
Back to Top