Opened 9 years ago

Closed 7 years ago

Last modified 4 months ago

#25307 closed Bug (fixed)

Cannot use .annotate with conditional expressions

Reported by: Jared Proffitt Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: josh.smeaton@…, matt@…, Simon Charette, newport.travis@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I am trying to create a query that first annotates a Count, then aggregates some Sums together using Case objects. If I do the query without the annotate, everything is fine. If I do the query without any Case objects in the aggregate, everything is fine. But when they are both present I get this error:

 'WhereNode' object has no attribute 'get_source_expressions'

My query is something like this:

counts = models.Package.objects.annotate(
    used_count=Count('projects')
).distinct().aggregate(
    no_parent=Sum(Case(
        When(parent_id__isnull=True, then=1),
        default=0,
        output_field=IntegerField()
    ))
)

I would like to eventually add the used_count value in the aggregate, but this itself won't even run.

Change History (23)

comment:1 by Tim Graham, 9 years ago

Any chance you could provide a regression test for Django's test suite? It's difficult to reproduce the issue without your models.

comment:2 by B Martsberger, 9 years ago

I have the same issue, here is enough code to reproduce the issue.

# In models.py
from django.db import models
class DateModel(models.Model):
    date1 = models.DateTimeField()
    date2 = models.DateTimeField()

# elsewhere
from django.db.models import Q, Case, When
from django.db.models.functions import Coalesce
from datetime import date

annotate = {'date': Coalesce('date1', 'date2')}
dms = DateModel.objects.annotate(**annotation)
aggregation = {
    'count_recent': Count(Case(When(Q(date__gte=date(year=2015, month=1, day=1)))))
}
dms = dms.aggregate(**aggregation)

AttributeError: 'WhereNode' object has no attribute 'get_source_expressions'

comment:3 by Josh Smeaton, 9 years ago

Cc: josh.smeaton@… added
Triage Stage: UnreviewedAccepted
Version: 1.8master

comment:4 by Josh Smeaton, 9 years ago

Looks related to #25316

comment:5 by Matt C, 8 years ago

Please correct me if I'm wrong, but it seems as if the issue lies with the django.db.models.expressions.When class, in that the condition attribute is supposed to be a django.db.models.Q object and yet the get_source_epxressions() and set_source_expressions() methods are treating the condition attribute as an django.db.models.expressions.Expression, when it most certainly is not.

This violates the API and hence client-code of the API (such as aggregation code in this case) will fail unexpectedly.

E.g.

In the example provided by @martsberger, the count_recent expression gets down to django.db.models.sql.query.Query.rewrite_cols(). That method recursively travels into the nested expressions until it hits the Q object returned by the When expression's get_source_expressions() and that's where it fails.
The problem is only triggered (during aggregation) by column rewriting, when this line evaluates to True in django.db.models.sql.query.Query.get_aggregation():

if (isinstance(self.group_by, list) or has_limit or has_existing_annotations or self.distinct):

As I said initially, I believe the root cause is the violation of the Expression API by the When subclass.

comment:6 by Matt C, 8 years ago

Cc: matt@… added

comment:7 by Matt C, 8 years ago

I'm not sure when/how, but the condition attribute of the When Expression gets converted to a django.db.models.sql.where.WhereNode, so it's actually a WhereNode object that is having .get_source_expressions() called on it.

Regardless, django.db.models.expressions.When's get_source_expressions() and set_source_expressions() methods need to bridge the gap between Expressions and django.utils.tree.Nodes, in order to properly re-write the column names to the aliases from the subquery.

I can't see a quick/easy way to do this, but as I'm not experienced in the ORM code, I could be wrong.

comment:8 by Anssi Kääriäinen, 8 years ago

We have a couple of things that *resolve* to an expression. For example, Q() and F() objects are not expressions, but will resolve to an expression when added to a query.

If WhereNode doesn't respect the expression API, then the fix is to make WhereNodes full expressions.

comment:9 by Matt C, 8 years ago

To clarify, I think it's the django.db.models.expressions.When class (which is a subclass of Expression) that is not respecting the API, with regards to its get_source_expressions() and set_source_expressions() methods.

comment:10 by Simon Charette, 8 years ago

Cc: Simon Charette added

Also just hit this when upgrading an old project using django-aggregate-if on Django 1.7 to conditional aggregation on Django 1.8.

comment:11 by Simon Charette, 8 years ago

From a short investigation it looks like the issue is actually caused by a combination of using distinct() and aggregate() of conditional expression.

Can the people affected confirm that removing the distinct() call from their query silence the exception?

Last edited 8 years ago by Simon Charette (previous) (diff)

comment:12 by Jared Proffitt, 8 years ago

That does make it break as well, but not simply just the use of distinct. If you use EITHER .distinct() OR .annotate() along with an .aggregate() that contains a conditional expression. (not sure exactly what part of the condition expression, but it must be either because of the Case() or the When().

I have made a completely useless example using the polls app in the django tutorial. I copied the polls models exactly, and created this query, which will cause the error to happen:

counts = models.Choice.objects.annotate(
	used_count=Count('question')
).distinct().aggregate(
	yes=Sum(Case(
		When(choice_text='yes', then=F('votes')),
		default=0,
		output_field=IntegerField()
	)),
	no=Sum(Case(
		When(choice_text='no', then=F('votes')),
		default=0,
		output_field=IntegerField()
	)),
)	

(Obviously, then used_count will always be 1, and its pretty pointless in this example, but you get the idea).

So if you remove the either the .annotate() OR the .distinct(), the query still breaks the same way. The presence of either one causes it to break. But if you remove BOTH, the query runs fine.

I do not know anything about the internals of the Query api, or I would help fix the problem. But it seems there is a problem when combining certain queryset methods like annotate or distinct with aggregates that have condition expressions.

I know it has to do with condition expressions, because doing the query like the following works just fine:

counts = models.Choice.objects.annotate(
	used_count=Count('question')
).distinct().aggregate(
	yes=Sum('votes'),
)

comment:14 by Travis Newport, 8 years ago

Cc: newport.travis@… added

As mentioned on the pull request, this issue seems to exist for any of the 4 cases here:

if (isinstance(self.group_by, list) or has_limit or has_existing_annotations or
        self.distinct):

I was able to write unit tests for all but the isinstance(self.group_by, list) case, if someone is able to provide a test for that case I can add it to my changes.
Edit: Unit test updates on my pull request should now cover all cases, my pull request can be closed in favor of #6267

Last edited 8 years ago by Travis Newport (previous) (diff)

comment:15 by Tim Graham, 8 years ago

Patch needs improvement: set

There's a failing test on the PR.

comment:16 by Anssi Kääriäinen, 8 years ago

Patch needs improvement: unset

comment:17 by Tim Graham, 8 years ago

Patch needs improvement: set

Two test failures for Oracle remain.

comment:18 by Josh Smeaton, 7 years ago

Triage Stage: AcceptedReady for checkin

comment:19 by Tim Graham <timograham@…>, 7 years ago

Resolution: fixed
Status: newclosed

In 1df89a60:

Fixed #25307 -- Fixed QuerySet.annotate() crash with conditional expressions.

Thanks Travis Newport for the tests and Josh Smeaton for contributing
to the patch.

comment:20 by Mariusz Felisiak <felisiak.mariusz@…>, 16 months ago

In b181cae2:

Refs #25307 -- Replaced SQLQuery.rewrite_cols() by replace_expressions().

The latter offers a more generic interface that doesn't require
specialized expression types handling.

comment:21 by Mariusz Felisiak <felisiak.mariusz@…>, 4 months ago

In 7530cf39:

Fixed #34975 -- Fixed crash of conditional aggregate() over aggregations.

Adjustments made to solve_lookup_type to defer the resolving of
references for summarized aggregates failed to account for similar
requirements for lookup values which can also reference annotations
through Aggregate.filter.

Regression in b181cae2e3697b2e53b5b67ac67e59f3b05a6f0d.

Refs #25307.

Thanks Sergey Nesterenko for the report.

comment:22 by Mariusz Felisiak <felisiak.mariusz@…>, 4 months ago

In 49f1ced8:

[5.0.x] Fixed #34975 -- Fixed crash of conditional aggregate() over aggregations.

Adjustments made to solve_lookup_type to defer the resolving of
references for summarized aggregates failed to account for similar
requirements for lookup values which can also reference annotations
through Aggregate.filter.

Regression in b181cae2e3697b2e53b5b67ac67e59f3b05a6f0d.

Refs #25307.

Thanks Sergey Nesterenko for the report.

Backport of 7530cf3900ab98104edcde69e8a2a415e82b345a from main

comment:23 by Mariusz Felisiak <felisiak.mariusz@…>, 4 months ago

In acf4cee:

[4.2.x] Fixed #34975 -- Fixed crash of conditional aggregate() over aggregations.

Adjustments made to solve_lookup_type to defer the resolving of
references for summarized aggregates failed to account for similar
requirements for lookup values which can also reference annotations
through Aggregate.filter.

Regression in b181cae2e3697b2e53b5b67ac67e59f3b05a6f0d.

Refs #25307.

Thanks Sergey Nesterenko for the report.

Backport of 7530cf3900ab98104edcde69e8a2a415e82b345a from main

Note: See TracTickets for help on using tickets.
Back to Top