Opened 8 years ago

Closed 7 years ago

Last modified 11 months ago

#25307 closed Bug (fixed)

Cannot use .annotate with conditional expressions

Reported by: Jared Proffitt Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: josh.smeaton@…, matt@…, Simon Charette, newport.travis@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

I am trying to create a query that first annotates a Count, then aggregates some Sums together using Case objects. If I do the query without the annotate, everything is fine. If I do the query without any Case objects in the aggregate, everything is fine. But when they are both present I get this error:

 'WhereNode' object has no attribute 'get_source_expressions'

My query is something like this:

counts = models.Package.objects.annotate(
    used_count=Count('projects')
).distinct().aggregate(
    no_parent=Sum(Case(
        When(parent_id__isnull=True, then=1),
        default=0,
        output_field=IntegerField()
    ))
)

I would like to eventually add the used_count value in the aggregate, but this itself won't even run.

Change History (20)

comment:1 Changed 8 years ago by Tim Graham

Any chance you could provide a regression test for Django's test suite? It's difficult to reproduce the issue without your models.

comment:2 Changed 8 years ago by B Martsberger

I have the same issue, here is enough code to reproduce the issue.

# In models.py
from django.db import models
class DateModel(models.Model):
    date1 = models.DateTimeField()
    date2 = models.DateTimeField()

# elsewhere
from django.db.models import Q, Case, When
from django.db.models.functions import Coalesce
from datetime import date

annotate = {'date': Coalesce('date1', 'date2')}
dms = DateModel.objects.annotate(**annotation)
aggregation = {
    'count_recent': Count(Case(When(Q(date__gte=date(year=2015, month=1, day=1)))))
}
dms = dms.aggregate(**aggregation)

AttributeError: 'WhereNode' object has no attribute 'get_source_expressions'

comment:3 Changed 8 years ago by Josh Smeaton

Cc: josh.smeaton@… added
Triage Stage: UnreviewedAccepted
Version: 1.8master

comment:4 Changed 8 years ago by Josh Smeaton

Looks related to #25316

comment:5 Changed 8 years ago by Matt C

Please correct me if I'm wrong, but it seems as if the issue lies with the django.db.models.expressions.When class, in that the condition attribute is supposed to be a django.db.models.Q object and yet the get_source_epxressions() and set_source_expressions() methods are treating the condition attribute as an django.db.models.expressions.Expression, when it most certainly is not.

This violates the API and hence client-code of the API (such as aggregation code in this case) will fail unexpectedly.

E.g.

In the example provided by @martsberger, the count_recent expression gets down to django.db.models.sql.query.Query.rewrite_cols(). That method recursively travels into the nested expressions until it hits the Q object returned by the When expression's get_source_expressions() and that's where it fails.
The problem is only triggered (during aggregation) by column rewriting, when this line evaluates to True in django.db.models.sql.query.Query.get_aggregation():

if (isinstance(self.group_by, list) or has_limit or has_existing_annotations or self.distinct):

As I said initially, I believe the root cause is the violation of the Expression API by the When subclass.

comment:6 Changed 8 years ago by Matt C

Cc: matt@… added

comment:7 Changed 8 years ago by Matt C

I'm not sure when/how, but the condition attribute of the When Expression gets converted to a django.db.models.sql.where.WhereNode, so it's actually a WhereNode object that is having .get_source_expressions() called on it.

Regardless, django.db.models.expressions.When's get_source_expressions() and set_source_expressions() methods need to bridge the gap between Expressions and django.utils.tree.Nodes, in order to properly re-write the column names to the aliases from the subquery.

I can't see a quick/easy way to do this, but as I'm not experienced in the ORM code, I could be wrong.

comment:8 Changed 8 years ago by Anssi Kääriäinen

We have a couple of things that *resolve* to an expression. For example, Q() and F() objects are not expressions, but will resolve to an expression when added to a query.

If WhereNode doesn't respect the expression API, then the fix is to make WhereNodes full expressions.

comment:9 Changed 8 years ago by Matt C

To clarify, I think it's the django.db.models.expressions.When class (which is a subclass of Expression) that is not respecting the API, with regards to its get_source_expressions() and set_source_expressions() methods.

comment:10 Changed 8 years ago by Simon Charette

Cc: Simon Charette added

Also just hit this when upgrading an old project using django-aggregate-if on Django 1.7 to conditional aggregation on Django 1.8.

comment:11 Changed 8 years ago by Simon Charette

From a short investigation it looks like the issue is actually caused by a distinct() and aggregate() of conditional expression combination.

Can the people affected confirm that removing the distinct() call from their query silence the exception?

Version 0, edited 8 years ago by Simon Charette (next)

comment:12 Changed 8 years ago by Jared Proffitt

That does make it break as well, but not simply just the use of distinct. If you use EITHER .distinct() OR .annotate() along with an .aggregate() that contains a conditional expression. (not sure exactly what part of the condition expression, but it must be either because of the Case() or the When().

I have made a completely useless example using the polls app in the django tutorial. I copied the polls models exactly, and created this query, which will cause the error to happen:

counts = models.Choice.objects.annotate(
	used_count=Count('question')
).distinct().aggregate(
	yes=Sum(Case(
		When(choice_text='yes', then=F('votes')),
		default=0,
		output_field=IntegerField()
	)),
	no=Sum(Case(
		When(choice_text='no', then=F('votes')),
		default=0,
		output_field=IntegerField()
	)),
)	

(Obviously, then used_count will always be 1, and its pretty pointless in this example, but you get the idea).

So if you remove the either the .annotate() OR the .distinct(), the query still breaks the same way. The presence of either one causes it to break. But if you remove BOTH, the query runs fine.

I do not know anything about the internals of the Query api, or I would help fix the problem. But it seems there is a problem when combining certain queryset methods like annotate or distinct with aggregates that have condition expressions.

I know it has to do with condition expressions, because doing the query like the following works just fine:

counts = models.Choice.objects.annotate(
	used_count=Count('question')
).distinct().aggregate(
	yes=Sum('votes'),
)

comment:13 Changed 8 years ago by Simon Charette

Has patch: set

comment:14 Changed 8 years ago by Travis Newport

Cc: newport.travis@… added

As mentioned on the pull request, this issue seems to exist for any of the 4 cases here:

if (isinstance(self.group_by, list) or has_limit or has_existing_annotations or
        self.distinct):

I was able to write unit tests for all but the isinstance(self.group_by, list) case, if someone is able to provide a test for that case I can add it to my changes.
Edit: Unit test updates on my pull request should now cover all cases, my pull request can be closed in favor of #6267

Last edited 8 years ago by Travis Newport (previous) (diff)

comment:15 Changed 8 years ago by Tim Graham

Patch needs improvement: set

There's a failing test on the PR.

comment:16 Changed 7 years ago by Anssi Kääriäinen

Patch needs improvement: unset

comment:17 Changed 7 years ago by Tim Graham

Patch needs improvement: set

Two test failures for Oracle remain.

comment:18 Changed 7 years ago by Josh Smeaton

Triage Stage: AcceptedReady for checkin

comment:19 Changed 7 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In 1df89a60:

Fixed #25307 -- Fixed QuerySet.annotate() crash with conditional expressions.

Thanks Travis Newport for the tests and Josh Smeaton for contributing
to the patch.

comment:20 Changed 11 months ago by Mariusz Felisiak <felisiak.mariusz@…>

In b181cae2:

Refs #25307 -- Replaced SQLQuery.rewrite_cols() by replace_expressions().

The latter offers a more generic interface that doesn't require
specialized expression types handling.

Note: See TracTickets for help on using tickets.
Back to Top