#25307 closed Bug (fixed)
Cannot use .annotate with conditional expressions
| Reported by: | Jared Proffitt | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | josh.smeaton@…, matt@…, Simon Charette, newport.travis@… | Triage Stage: | Ready for checkin |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description
I am trying to create a query that first annotates a Count, then aggregates some Sums together using Case objects. If I do the query without the annotate, everything is fine. If I do the query without any Case objects in the aggregate, everything is fine. But when they are both present I get this error:
'WhereNode' object has no attribute 'get_source_expressions'
My query is something like this:
counts = models.Package.objects.annotate(
used_count=Count('projects')
).distinct().aggregate(
no_parent=Sum(Case(
When(parent_id__isnull=True, then=1),
default=0,
output_field=IntegerField()
))
)
I would like to eventually add the used_count value in the aggregate, but this itself won't even run.
Change History (23)
comment:1 by , 10 years ago
comment:2 by , 10 years ago
I have the same issue, here is enough code to reproduce the issue.
# In models.py
from django.db import models
class DateModel(models.Model):
date1 = models.DateTimeField()
date2 = models.DateTimeField()
# elsewhere
from django.db.models import Q, Case, When
from django.db.models.functions import Coalesce
from datetime import date
annotate = {'date': Coalesce('date1', 'date2')}
dms = DateModel.objects.annotate(**annotation)
aggregation = {
'count_recent': Count(Case(When(Q(date__gte=date(year=2015, month=1, day=1)))))
}
dms = dms.aggregate(**aggregation)
AttributeError: 'WhereNode' object has no attribute 'get_source_expressions'
comment:3 by , 10 years ago
| Cc: | added |
|---|---|
| Triage Stage: | Unreviewed → Accepted |
| Version: | 1.8 → master |
comment:5 by , 10 years ago
Please correct me if I'm wrong, but it seems as if the issue lies with the django.db.models.expressions.When class, in that the condition attribute is supposed to be a django.db.models.Q object and yet the get_source_epxressions() and set_source_expressions() methods are treating the condition attribute as an django.db.models.expressions.Expression, when it most certainly is not.
This violates the API and hence client-code of the API (such as aggregation code in this case) will fail unexpectedly.
E.g.
In the example provided by @martsberger, the count_recent expression gets down to django.db.models.sql.query.Query.rewrite_cols(). That method recursively travels into the nested expressions until it hits the Q object returned by the When expression's get_source_expressions() and that's where it fails.
The problem is only triggered (during aggregation) by column rewriting, when this line evaluates to True in django.db.models.sql.query.Query.get_aggregation():
if (isinstance(self.group_by, list) or has_limit or has_existing_annotations or self.distinct):
As I said initially, I believe the root cause is the violation of the Expression API by the When subclass.
comment:6 by , 10 years ago
| Cc: | added |
|---|
comment:7 by , 10 years ago
I'm not sure when/how, but the condition attribute of the When Expression gets converted to a django.db.models.sql.where.WhereNode, so it's actually a WhereNode object that is having .get_source_expressions() called on it.
Regardless, django.db.models.expressions.When's get_source_expressions() and set_source_expressions() methods need to bridge the gap between Expressions and django.utils.tree.Nodes, in order to properly re-write the column names to the aliases from the subquery.
I can't see a quick/easy way to do this, but as I'm not experienced in the ORM code, I could be wrong.
comment:8 by , 10 years ago
We have a couple of things that *resolve* to an expression. For example, Q() and F() objects are not expressions, but will resolve to an expression when added to a query.
If WhereNode doesn't respect the expression API, then the fix is to make WhereNodes full expressions.
comment:9 by , 10 years ago
To clarify, I think it's the django.db.models.expressions.When class (which is a subclass of Expression) that is not respecting the API, with regards to its get_source_expressions() and set_source_expressions() methods.
comment:10 by , 10 years ago
| Cc: | added |
|---|
Also just hit this when upgrading an old project using django-aggregate-if on Django 1.7 to conditional aggregation on Django 1.8.
comment:11 by , 10 years ago
From a short investigation it looks like the issue is actually caused by a combination of using distinct() and aggregate() of conditional expression.
Can the people affected confirm that removing the distinct() call from their query silence the exception?
comment:12 by , 10 years ago
That does make it break as well, but not simply just the use of distinct. If you use EITHER .distinct() OR .annotate() along with an .aggregate() that contains a conditional expression. (not sure exactly what part of the condition expression, but it must be either because of the Case() or the When().
I have made a completely useless example using the polls app in the django tutorial. I copied the polls models exactly, and created this query, which will cause the error to happen:
counts = models.Choice.objects.annotate(
used_count=Count('question')
).distinct().aggregate(
yes=Sum(Case(
When(choice_text='yes', then=F('votes')),
default=0,
output_field=IntegerField()
)),
no=Sum(Case(
When(choice_text='no', then=F('votes')),
default=0,
output_field=IntegerField()
)),
)
(Obviously, then used_count will always be 1, and its pretty pointless in this example, but you get the idea).
So if you remove the either the .annotate() OR the .distinct(), the query still breaks the same way. The presence of either one causes it to break. But if you remove BOTH, the query runs fine.
I do not know anything about the internals of the Query api, or I would help fix the problem. But it seems there is a problem when combining certain queryset methods like annotate or distinct with aggregates that have condition expressions.
I know it has to do with condition expressions, because doing the query like the following works just fine:
counts = models.Choice.objects.annotate(
used_count=Count('question')
).distinct().aggregate(
yes=Sum('votes'),
)
comment:14 by , 10 years ago
| Cc: | added |
|---|
As mentioned on the pull request, this issue seems to exist for any of the 4 cases here:
if (isinstance(self.group_by, list) or has_limit or has_existing_annotations or
self.distinct):
I was able to write unit tests for all but the isinstance(self.group_by, list) case, if someone is able to provide a test for that case I can add it to my changes.
Edit: Unit test updates on my pull request should now cover all cases, my pull request can be closed in favor of #6267
comment:16 by , 10 years ago
| Patch needs improvement: | unset |
|---|
comment:18 by , 9 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
Any chance you could provide a regression test for Django's test suite? It's difficult to reproduce the issue without your models.