#30188 closed Bug (fixed)
Aggregate annotation, Case() - When(): AssertionError No exception message supplied
Reported by: | Lukas Klement | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 2.1 |
Severity: | Normal | Keywords: | query, aggregate, case/when |
Cc: | Can Sarıgöl, Simon Charette | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Aggregating annotations works for simple Sum, Count, etc. operations, but fails when the Sum() contains a Case() When() operation.
To reproduce the issue, a simplified scenario:
Suppose you have recipes (e.g. Pasta dough, Tomato Sauce, etc.), which are linked to dishes (e.g. Pasta with tomato sauce) and ingredients (e.g. Flour, Water, Tomatoes). Ingredients are shared across all users, but each user can add a different purchase price to an ingredient (-> cost). We want to sum up the cost of a dish (i.e. summing up the cost of all ingredients across all recipes of a dish).
models.py
class Ingredient(models.Model): pass class IngredientDetail(models.Model): ingredient = models.ForeignKey(Ingredient, on_delete=models.CASCADE, related_name='ingredient_details') user = models.ForeignKey(User, limit_choices_to={'role': 0}, on_delete=models.CASCADE) cost = models.DecimalField(blank=True, null=True, decimal_places=6, max_digits=13) class Recipe(models.Model): dishes = models.ManyToManyField(Dish, related_name='dishes_recipes', through='RecipeDish') ingredients = models.ManyToManyField(Ingredient, related_name='ingredients_recipes', through='RecipeIngredient') class RecipeDish(models.Model): recipe = models.ForeignKey(Recipe, on_delete=models.CASCADE, related_name='recipedish_recipe') dish = models.ForeignKey(Dish, on_delete=models.CASCADE, related_name='recipedish_dish') class RecipeIngredient(models.Model): recipe = models.ForeignKey(Recipe, on_delete=models.CASCADE, related_name='recipeingredient_recipe') ingredient = models.ForeignKey(Ingredient, on_delete=models.CASCADE, related_name='recipeingredient_ingredient') quantity = models.DecimalField(blank=True, null=True, decimal_places=2, max_digits=6)
Different users can attach different IngredientDetail objects to a shared Ingredient objects (one-to-many relationship). The objective is to calculate how many ingredients added to a recipe (many-to-many relationship through RecipeIngredient) don't have a cost value set. To achieve this without using .extra(), we annotate the appropriate cost field to IngredientNutrition, then sum up the missing values using a Sum(), Case() and When() aggregation.
def aggregate_nutrition_data(recipes, user): annotations['cost_value'] = Subquery( IngredientDetail.objects.filter(Q(ingredient_id=OuterRef('ingredient_id')) & Q(user=user)).values('cost')[:1] ) ingredients = RecipeIngredient.objects.filter(recipe__in=recipes).annotate(**annotations) aggregators['cost_missing'] = Coalesce(Sum( Case( When(**{'cost_value': None}, then=Value(1)), default=Value(0), output_field=IntegerField() ) ), Value(0)) data = ingredients.aggregate(**aggregators) return data
Expected behaviour: a dictionary is returned: {'cost_missing': <value>}, containing as a value the number of missing cost values.Actual behaviour: AssertionError No exception message supplied is returned on the line data = ingredients.aggregate(aggregators) (see abbreviated stacktrace below).
When the aggregated field name is set to a field that exists on the model of the queryset (i.e. is not annotated), the aggregation works. For example: instead of using cost_value=None in the When() of the aggregators, using quantity=None works.
Similarly, doing an aggregation over an annotated field without using Case() and When() works. For example:
aggregators['cost'] = Coalesce( Sum( F('cost_value') * F('quantity') / F('cost_quantity') output_field=DecimalField() ), Value(0)
Stacktrace:
<Line with the .aggregate(...)> return query.get_aggregation(self.db, kwargs) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in get_aggregation expression, col_cnt = inner_query.rewrite_cols(expression, col_cnt) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols annotation.set_source_expressions(new_exprs) ... /Users/{{path}}/lib/python3.6/site-packages/django/db/models/expressions.py in set_source_expressions assert not exprs
The database is a PostgreSQL database, the Django version is 2.1.7, the Python version is 3.6.4. The problem also exists under Django version 2.2b1
Attachments (1)
Change History (30)
comment:1 by , 6 years ago
Description: | modified (diff) |
---|
comment:2 by , 6 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:3 by , 6 years ago
Description: | modified (diff) |
---|---|
Resolution: | needsinfo |
Status: | closed → new |
comment:4 by , 6 years ago
Hi Simon, thank you for your response.
I have re-opened the ticket and added a simplified example to reproduce the issue. The issue is current in Django 2.1 and can be reproduced under Django version 2.2b1 as well.
comment:5 by , 6 years ago
Description: | modified (diff) |
---|
comment:6 by , 6 years ago
Description: | modified (diff) |
---|
comment:7 by , 6 years ago
Description: | modified (diff) |
---|
comment:8 by , 6 years ago
Hi Lukas, Any chance you could put this into a sample project and attach that (or a test case in a PR) so we can run this example and experiment? (It would save a cycle and speed review if you could.)
Thanks.
comment:9 by , 6 years ago
Triage Stage: | Unreviewed → Accepted |
---|
Sample project attached. Even if the query isn't valid, a more helpful error message would be useful.
by , 6 years ago
Attachment: | t30188.zip added |
---|
comment:11 by , 6 years ago
Hi Tim, thank you for attaching an example project - why is the query not valid? Why is it that annotated fields derived from annotated fields that used Case() / When() cannot be aggregated?
comment:12 by , 6 years ago
I didn't look at this in any depth. I don't know whether or not the query is valid. I only meant to say that if it's not valid, there should be a better message. The proposed wording on the PR (if the query is indeed invalid) ... 'Related Field got invalid lookup: salary' ... doesn't seem to provide much explanation.
comment:13 by , 6 years ago
Cc: | added |
---|
Hi, actually I thought this sample is the same with below so the same exception message might be ok. Besides, I will check the query.
ingredients = RecipeIngredient.objects.aggregate( cost_missing = Coalesce(Sum( Case( When(ingredient__cost__isnull=False, then=Value(1)), default=Value(0), output_field=IntegerField()) ), Value(0)) )
comment:14 by , 6 years ago
I agree with Lukas that we should investigate why the ORM crashes here as it's a completely valid query from my perspective.
The fundamental issue here is that Subquery
breaks the Expression
API by having a get_source_expressions
method that returns N > 0
expressions while having a set_source_expressions
that accepts none.
I happen to have a local branch I've been playing with locally to make Subquery
more a of dumb wrapper around sql.Query
that happens to restore this contract. I haven't tried it out against this exact set of models but the fact the caller hosts a FIXME makes me feel like there might be something else going on.
comment:15 by , 6 years ago
Query like this (removed user and recipes):
SELECT COALESCE( SUM(CASE WHEN (SELECT U0."cost" FROM "t30188_ingredientdetail" U0 WHERE U0."ingredient_id" = (subquery."ingredient_id") LIMIT 1) IS NULL THEN 1 ELSE 1 END), 1) FROM (SELECT "t30188_recipeingredient"."id" AS Col1, (SELECT U0."cost" FROM "t30188_ingredientdetail" U0 WHERE U0."ingredient_id" = ("t30188_recipeingredient"."ingredient_id") LIMIT 1) AS "cost_value", "U0"."ingredient_id" AS "__col1" FROM "t30188_recipeingredient" GROUP BY "t30188_recipeingredient"."id", (SELECT U0."cost" FROM "t30188_ingredientdetail" U0 WHERE U0."ingredient_id" = ("t30188_recipeingredient"."ingredient_id") LIMIT 1), "U0"."ingredient_id") subquery
the problem is "U0"."ingredient_id" AS "__col1"
, It should have been "t30188_recipeingredient"."ingredient_id"
. to solve this, I will also check your point @Simon Charette thanks.
comment:16 by , 6 years ago
Cc: | added |
---|
comment:17 by , 6 years ago
Patch needs improvement: | set |
---|
comment:18 by , 6 years ago
The assertion error is addressed by https://github.com/django/django/pull/11062 but it unveiled another issue regarding aggregation over annotations.
I managed to get it to work by adjusting the hacky Query.rewrite_cols
function but then I noticed the subquery was annotated twice which might hurt performance. I guess the suggested approach at least works and the double annotation issue seems to be generalized anyway. I suggest we open a following ticket to address the rewrite_cols
extra annotation issue and proceed with the current approach for now.
comment:19 by , 6 years ago
Can we fix twice annotated problem in here? I thought that we can remove the expr if it is already in annotations (expr == old_expr
)
elif isinstance(expr, (Col, Subquery)) or (expr.contains_aggregate and not expr.is_summary): ... new_exprs.append(Ref(col_alias, expr)) if isinstance(expr, Subquery): keys = { k for k, old_expr in self.annotations.items() if k.startswith('__col') is False and expr == old_expr } for _k in keys: del self.annotations[_k]
comment:20 by , 6 years ago
Can, I guess we could somehow but I feel like we'd need a more elegant solution than that.
Note that the double annotation issue doesn't only happen for subqueries, you might have noticed that the company_id
which is first annotated as Col0
gets annotated as __col0
as well. This makes me believe we might need to do it for Col
instances as well?
comment:21 by , 6 years ago
I just pushed another commit that addresses the double aliasing issue. I credited you in the commit message Can.
Hey Lukas,
Given the assertion error it looks like there's something wrong here but because of the limited data you provided it's almost impossible for volunteers to reproduce and confirm Django is actually at fault.
Please re-open this ticket if you can provide a reduced test case for your problem. This should include a set of minimal models calls to build the queryset that crashes on
.aggregate
. It would also be great if you could test against the current Django 2.2 beta (2.2b1) and confirm whether or not it's fixed there.Thank you for your understanding.