Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#30188 closed Bug (fixed)

Aggregate annotation, Case() - When(): AssertionError No exception message supplied

Reported by: Lukas Klement Owned by: nobody
Component: Database layer (models, ORM) Version: 2.1
Severity: Normal Keywords: query, aggregate, case/when
Cc: Can Sarıgöl, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Lukas Klement)

Aggregating annotations works for simple Sum, Count, etc. operations, but fails when the Sum() contains a Case() When() operation.

To reproduce the issue, a simplified scenario:
Suppose you have recipes (e.g. Pasta dough, Tomato Sauce, etc.), which are linked to dishes (e.g. Pasta with tomato sauce) and ingredients (e.g. Flour, Water, Tomatoes). Ingredients are shared across all users, but each user can add a different purchase price to an ingredient (-> cost). We want to sum up the cost of a dish (i.e. summing up the cost of all ingredients across all recipes of a dish).

models.py

class Ingredient(models.Model):
    pass

class IngredientDetail(models.Model):
    ingredient = models.ForeignKey(Ingredient, on_delete=models.CASCADE, related_name='ingredient_details')
    user = models.ForeignKey(User, limit_choices_to={'role': 0}, on_delete=models.CASCADE)
    cost = models.DecimalField(blank=True, null=True, decimal_places=6, max_digits=13)

class Recipe(models.Model):
    dishes = models.ManyToManyField(Dish, related_name='dishes_recipes', through='RecipeDish')
    ingredients = models.ManyToManyField(Ingredient, related_name='ingredients_recipes', through='RecipeIngredient')

class RecipeDish(models.Model):
    recipe = models.ForeignKey(Recipe, on_delete=models.CASCADE, related_name='recipedish_recipe')
    dish = models.ForeignKey(Dish, on_delete=models.CASCADE, related_name='recipedish_dish')


class RecipeIngredient(models.Model):
    recipe = models.ForeignKey(Recipe, on_delete=models.CASCADE, related_name='recipeingredient_recipe')
    ingredient = models.ForeignKey(Ingredient, on_delete=models.CASCADE, related_name='recipeingredient_ingredient')
    quantity = models.DecimalField(blank=True, null=True, decimal_places=2, max_digits=6)

Different users can attach different IngredientDetail objects to a shared Ingredient objects (one-to-many relationship). The objective is to calculate how many ingredients added to a recipe (many-to-many relationship through RecipeIngredient) don't have a cost value set. To achieve this without using .extra(), we annotate the appropriate cost field to IngredientNutrition, then sum up the missing values using a Sum(), Case() and When() aggregation.

def aggregate_nutrition_data(recipes, user):
    annotations['cost_value'] = Subquery(
        IngredientDetail.objects.filter(Q(ingredient_id=OuterRef('ingredient_id')) & Q(user=user)).values('cost')[:1]
    )
    ingredients = RecipeIngredient.objects.filter(recipe__in=recipes).annotate(**annotations)
    aggregators['cost_missing'] = Coalesce(Sum(
        Case(
           When(**{'cost_value': None}, then=Value(1)),
           default=Value(0),
           output_field=IntegerField()
       )
    ), Value(0))
    data = ingredients.aggregate(**aggregators)

    return data

Expected behaviour: a dictionary is returned: {'cost_missing': <value>}, containing as a value the number of missing cost values.Actual behaviour: AssertionError No exception message supplied is returned on the line data = ingredients.aggregate(aggregators) (see abbreviated stacktrace below).

When the aggregated field name is set to a field that exists on the model of the queryset (i.e. is not annotated), the aggregation works. For example: instead of using cost_value=None in the When() of the aggregators, using quantity=None works.
Similarly, doing an aggregation over an annotated field without using Case() and When() works. For example:

    aggregators['cost'] = Coalesce(
        Sum(
            F('cost_value') * F('quantity') / F('cost_quantity')
            output_field=DecimalField()
        ), Value(0)

Stacktrace:

<Line with the .aggregate(...)>
        return query.get_aggregation(self.db, kwargs) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in get_aggregation
                    expression, col_cnt = inner_query.rewrite_cols(expression, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
        annotation.set_source_expressions(new_exprs) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/expressions.py in set_source_expressions
        assert not exprs 

The database is a PostgreSQL database, the Django version is 2.1.7, the Python version is 3.6.4. The problem also exists under Django version 2.2b1

Attachments (1)

t30188.zip (3.5 KB ) - added by Tim Graham 5 years ago.

Download all attachments as: .zip

Change History (30)

comment:1 by Lukas Klement, 5 years ago

Description: modified (diff)

comment:2 by Simon Charette, 5 years ago

Resolution: needsinfo
Status: newclosed

Hey Lukas,

Given the assertion error it looks like there's something wrong here but because of the limited data you provided it's almost impossible for volunteers to reproduce and confirm Django is actually at fault.

Please re-open this ticket if you can provide a reduced test case for your problem. This should include a set of minimal models calls to build the queryset that crashes on .aggregate. It would also be great if you could test against the current Django 2.2 beta (2.2b1) and confirm whether or not it's fixed there.

Thank you for your understanding.

comment:3 by Lukas Klement, 5 years ago

Description: modified (diff)
Resolution: needsinfo
Status: closednew

comment:4 by Lukas Klement, 5 years ago

Hi Simon, thank you for your response.

I have re-opened the ticket and added a simplified example to reproduce the issue. The issue is current in Django 2.1 and can be reproduced under Django version 2.2b1 as well.

Last edited 5 years ago by Lukas Klement (previous) (diff)

comment:5 by Lukas Klement, 5 years ago

Description: modified (diff)

comment:6 by Lukas Klement, 5 years ago

Description: modified (diff)

comment:7 by Lukas Klement, 5 years ago

Description: modified (diff)

comment:8 by Carlton Gibson, 5 years ago

Hi Lukas, Any chance you could put this into a sample project and attach that (or a test case in a PR) so we can run this example and experiment? (It would save a cycle and speed review if you could.)

Thanks.

comment:9 by Tim Graham, 5 years ago

Triage Stage: UnreviewedAccepted

Sample project attached. Even if the query isn't valid, a more helpful error message would be useful.

by Tim Graham, 5 years ago

Attachment: t30188.zip added

comment:10 by Can Sarıgöl, 5 years ago

Has patch: set

comment:11 by Lukas Klement, 5 years ago

Hi Tim, thank you for attaching an example project - why is the query not valid? Why is it that annotated fields derived from annotated fields that used Case() / When() cannot be aggregated?

comment:12 by Tim Graham, 5 years ago

I didn't look at this in any depth. I don't know whether or not the query is valid. I only meant to say that if it's not valid, there should be a better message. The proposed wording on the PR (if the query is indeed invalid) ... 'Related Field got invalid lookup: salary' ... doesn't seem to provide much explanation.

comment:13 by Can Sarıgöl, 5 years ago

Cc: Can Sarıgöl added

Hi, actually I thought this sample is the same with below so the same exception message might be ok. Besides, I will check the query.

ingredients = RecipeIngredient.objects.aggregate(
            cost_missing = Coalesce(Sum(
                Case(
                When(ingredient__cost__isnull=False, then=Value(1)),
                default=Value(0),
                output_field=IntegerField())
            ), Value(0))
        )

comment:14 by Simon Charette, 5 years ago

I agree with Lukas that we should investigate why the ORM crashes here as it's a completely valid query from my perspective.

The fundamental issue here is that Subquery breaks the Expression API by having a get_source_expressions method that returns N > 0 expressions while having a set_source_expressions that accepts none.

I happen to have a local branch I've been playing with locally to make Subquery more a of dumb wrapper around sql.Query that happens to restore this contract. I haven't tried it out against this exact set of models but the fact the caller hosts a FIXME makes me feel like there might be something else going on.

comment:15 by Can Sarıgöl, 5 years ago

Query like this (removed user and recipes):

SELECT COALESCE(
		SUM(CASE WHEN (SELECT U0."cost" 
					   FROM "t30188_ingredientdetail" U0 
					   WHERE U0."ingredient_id" = (subquery."ingredient_id")  LIMIT 1) IS NULL THEN 1 ELSE 1 END), 1) 
FROM (SELECT "t30188_recipeingredient"."id" AS Col1, 
	  (SELECT U0."cost" 
	   FROM "t30188_ingredientdetail" U0 
	   WHERE U0."ingredient_id" = ("t30188_recipeingredient"."ingredient_id")  LIMIT 1) AS "cost_value", 
	  "U0"."ingredient_id" AS "__col1" 
	  FROM "t30188_recipeingredient" 
	  GROUP BY "t30188_recipeingredient"."id", 
	  (SELECT U0."cost" 
	   FROM "t30188_ingredientdetail" U0 
	   WHERE U0."ingredient_id" = ("t30188_recipeingredient"."ingredient_id")  LIMIT 1), 
	  "U0"."ingredient_id") subquery

the problem is "U0"."ingredient_id" AS "__col1", It should have been "t30188_recipeingredient"."ingredient_id". to solve this, I will also check your point @Simon Charette thanks.

Last edited 5 years ago by Can Sarıgöl (previous) (diff)

comment:16 by Simon Charette, 5 years ago

Cc: Simon Charette added

comment:17 by Tim Graham, 5 years ago

Patch needs improvement: set

comment:18 by Simon Charette, 5 years ago

The assertion error is addressed by https://github.com/django/django/pull/11062 but it unveiled another issue regarding aggregation over annotations.

I managed to get it to work by adjusting the hacky Query.rewrite_cols function but then I noticed the subquery was annotated twice which might hurt performance. I guess the suggested approach at least works and the double annotation issue seems to be generalized anyway. I suggest we open a following ticket to address the rewrite_cols extra annotation issue and proceed with the current approach for now.

comment:19 by Can Sarıgöl, 5 years ago

Can we fix twice annotated problem in here? I thought that we can remove the expr if it is already in annotations (expr == old_expr)

elif isinstance(expr, (Col, Subquery)) or (expr.contains_aggregate and not expr.is_summary):
 ...
 new_exprs.append(Ref(col_alias, expr))
 if isinstance(expr, Subquery):
     keys = {
         k for k, old_expr in self.annotations.items()
         if k.startswith('__col') is False and expr == old_expr
      }
     for _k in keys:
         del self.annotations[_k]


comment:20 by Simon Charette, 5 years ago

Can, I guess we could somehow but I feel like we'd need a more elegant solution than that.

Note that the double annotation issue doesn't only happen for subqueries, you might have noticed that the company_id which is first annotated as Col0 gets annotated as __col0 as well. This makes me believe we might need to do it for Col instances as well?

comment:21 by Simon Charette, 5 years ago

I just pushed another commit that addresses the double aliasing issue. I credited you in the commit message Can.

comment:22 by Can Sarıgöl, 5 years ago

Thanks for that. This is a better solution.

comment:23 by Tim Graham <timograham@…>, 5 years ago

In 35431298:

Refs #27149 -- Moved subquery expression resolving to Query.

This makes Subquery a thin wrapper over Query and makes sure it respects
the Expression source expression API by accepting the same number of
expressions as it returns. Refs #30188.

It also makes OuterRef usable in Query without Subquery wrapping. This
should allow Query's internals to more easily perform subquery push downs
during split_exclude(). Refs #21703.

comment:24 by Tim Graham <timograham@…>, 5 years ago

Resolution: fixed
Status: newclosed

In bdc07f1:

Fixed #30188 -- Fixed a crash when aggregating over a subquery annotation.

comment:25 by Tim Graham <timograham@…>, 5 years ago

In d1e9c25:

Refs #30188 -- Prevented double annotation of subquery when aggregated over.

Thanks Can Sarıgöl for the suggested trimming approach.

comment:26 by Tim Graham <timograham@…>, 5 years ago

In 3f32154:

Refs #30188 -- Avoided GROUP BY when aggregating over non-aggregates.

comment:27 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In adfbf653:

Fixed #31568 -- Fixed alias reference when aggregating over multiple subqueries.

691def10a0197d83d2d108bd9043b0916d0f09b4 made all Subquery() instances
equal to each other which broke aggregation subquery pushdown which
relied on object equality to determine which alias it should select.

Subquery.eq() will be fixed in an another commit but
Query.rewrite_cols() should haved used object identity from the start.

Refs #30727, #30188.

Thanks Makina Corpus for the report.

comment:28 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 3913acdb:

[3.1.x] Fixed #31568 -- Fixed alias reference when aggregating over multiple subqueries.

691def10a0197d83d2d108bd9043b0916d0f09b4 made all Subquery() instances
equal to each other which broke aggregation subquery pushdown which
relied on object equality to determine which alias it should select.

Subquery.eq() will be fixed in an another commit but
Query.rewrite_cols() should haved used object identity from the start.

Refs #30727, #30188.

Thanks Makina Corpus for the report.

Backport of adfbf653dc1c1d0e0dacc4ed46602d22ba28b004 from master

comment:29 by Mariusz Felisiak <felisiak.mariusz@…>, 4 years ago

In 49bbf657:

[3.0.x] Fixed #31568 -- Fixed alias reference when aggregating over multiple subqueries.

691def10a0197d83d2d108bd9043b0916d0f09b4 made all Subquery() instances
equal to each other which broke aggregation subquery pushdown which
relied on object equality to determine which alias it should select.

Subquery.eq() will be fixed in an another commit but
Query.rewrite_cols() should haved used object identity from the start.

Refs #30727, #30188.

Thanks Makina Corpus for the report.

Backport of adfbf653dc1c1d0e0dacc4ed46602d22ba28b004 from master

Note: See TracTickets for help on using tickets.
Back to Top