Opened 4 years ago

Closed 4 years ago

Last modified 3 years ago

#30188 closed Bug (fixed)

Aggregate annotation, Case() - When(): AssertionError No exception message supplied

Reported by: Lukas Klement Owned by: nobody
Component: Database layer (models, ORM) Version: 2.1
Severity: Normal Keywords: query, aggregate, case/when
Cc: Can Sarıgöl, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description (last modified by Lukas Klement)

Aggregating annotations works for simple Sum, Count, etc. operations, but fails when the Sum() contains a Case() When() operation.

To reproduce the issue, a simplified scenario:
Suppose you have recipes (e.g. Pasta dough, Tomato Sauce, etc.), which are linked to dishes (e.g. Pasta with tomato sauce) and ingredients (e.g. Flour, Water, Tomatoes). Ingredients are shared across all users, but each user can add a different purchase price to an ingredient (-> cost). We want to sum up the cost of a dish (i.e. summing up the cost of all ingredients across all recipes of a dish).

models.py

class Ingredient(models.Model):
    pass

class IngredientDetail(models.Model):
    ingredient = models.ForeignKey(Ingredient, on_delete=models.CASCADE, related_name='ingredient_details')
    user = models.ForeignKey(User, limit_choices_to={'role': 0}, on_delete=models.CASCADE)
    cost = models.DecimalField(blank=True, null=True, decimal_places=6, max_digits=13)

class Recipe(models.Model):
    dishes = models.ManyToManyField(Dish, related_name='dishes_recipes', through='RecipeDish')
    ingredients = models.ManyToManyField(Ingredient, related_name='ingredients_recipes', through='RecipeIngredient')

class RecipeDish(models.Model):
    recipe = models.ForeignKey(Recipe, on_delete=models.CASCADE, related_name='recipedish_recipe')
    dish = models.ForeignKey(Dish, on_delete=models.CASCADE, related_name='recipedish_dish')


class RecipeIngredient(models.Model):
    recipe = models.ForeignKey(Recipe, on_delete=models.CASCADE, related_name='recipeingredient_recipe')
    ingredient = models.ForeignKey(Ingredient, on_delete=models.CASCADE, related_name='recipeingredient_ingredient')
    quantity = models.DecimalField(blank=True, null=True, decimal_places=2, max_digits=6)

Different users can attach different IngredientDetail objects to a shared Ingredient objects (one-to-many relationship). The objective is to calculate how many ingredients added to a recipe (many-to-many relationship through RecipeIngredient) don't have a cost value set. To achieve this without using .extra(), we annotate the appropriate cost field to IngredientNutrition, then sum up the missing values using a Sum(), Case() and When() aggregation.

def aggregate_nutrition_data(recipes, user):
    annotations['cost_value'] = Subquery(
        IngredientDetail.objects.filter(Q(ingredient_id=OuterRef('ingredient_id')) & Q(user=user)).values('cost')[:1]
    )
    ingredients = RecipeIngredient.objects.filter(recipe__in=recipes).annotate(**annotations)
    aggregators['cost_missing'] = Coalesce(Sum(
        Case(
           When(**{'cost_value': None}, then=Value(1)),
           default=Value(0),
           output_field=IntegerField()
       )
    ), Value(0))
    data = ingredients.aggregate(**aggregators)

    return data

Expected behaviour: a dictionary is returned: {'cost_missing': <value>}, containing as a value the number of missing cost values.Actual behaviour: AssertionError No exception message supplied is returned on the line data = ingredients.aggregate(aggregators) (see abbreviated stacktrace below).

When the aggregated field name is set to a field that exists on the model of the queryset (i.e. is not annotated), the aggregation works. For example: instead of using cost_value=None in the When() of the aggregators, using quantity=None works.
Similarly, doing an aggregation over an annotated field without using Case() and When() works. For example:

    aggregators['cost'] = Coalesce(
        Sum(
            F('cost_value') * F('quantity') / F('cost_quantity')
            output_field=DecimalField()
        ), Value(0)

Stacktrace:

<Line with the .aggregate(...)>
        return query.get_aggregation(self.db, kwargs) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in get_aggregation
                    expression, col_cnt = inner_query.rewrite_cols(expression, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
                new_expr, col_cnt = self.rewrite_cols(expr, col_cnt) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/sql/query.py in rewrite_cols
        annotation.set_source_expressions(new_exprs) ...
/Users/{{path}}/lib/python3.6/site-packages/django/db/models/expressions.py in set_source_expressions
        assert not exprs 

The database is a PostgreSQL database, the Django version is 2.1.7, the Python version is 3.6.4. The problem also exists under Django version 2.2b1

Attachments (1)

t30188.zip (3.5 KB) - added by Tim Graham 4 years ago.

Download all attachments as: .zip

Change History (30)

comment:1 Changed 4 years ago by Lukas Klement

Description: modified (diff)

comment:2 Changed 4 years ago by Simon Charette

Resolution: needsinfo
Status: newclosed

Hey Lukas,

Given the assertion error it looks like there's something wrong here but because of the limited data you provided it's almost impossible for volunteers to reproduce and confirm Django is actually at fault.

Please re-open this ticket if you can provide a reduced test case for your problem. This should include a set of minimal models calls to build the queryset that crashes on .aggregate. It would also be great if you could test against the current Django 2.2 beta (2.2b1) and confirm whether or not it's fixed there.

Thank you for your understanding.

comment:3 Changed 4 years ago by Lukas Klement

Description: modified (diff)
Resolution: needsinfo
Status: closednew

comment:4 Changed 4 years ago by Lukas Klement

Hi Simon, thank you for your response.

I have re-opened the ticket and added a simplified example to reproduce the issue. The issue is current in Django 2.1 and can be reproduced under Django version 2.2b1 as well.

Last edited 4 years ago by Lukas Klement (previous) (diff)

comment:5 Changed 4 years ago by Lukas Klement

Description: modified (diff)

comment:6 Changed 4 years ago by Lukas Klement

Description: modified (diff)

comment:7 Changed 4 years ago by Lukas Klement

Description: modified (diff)

comment:8 Changed 4 years ago by Carlton Gibson

Hi Lukas, Any chance you could put this into a sample project and attach that (or a test case in a PR) so we can run this example and experiment? (It would save a cycle and speed review if you could.)

Thanks.

comment:9 Changed 4 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

Sample project attached. Even if the query isn't valid, a more helpful error message would be useful.

Changed 4 years ago by Tim Graham

Attachment: t30188.zip added

comment:10 Changed 4 years ago by Can Sarıgöl

Has patch: set

comment:11 Changed 4 years ago by Lukas Klement

Hi Tim, thank you for attaching an example project - why is the query not valid? Why is it that annotated fields derived from annotated fields that used Case() / When() cannot be aggregated?

comment:12 Changed 4 years ago by Tim Graham

I didn't look at this in any depth. I don't know whether or not the query is valid. I only meant to say that if it's not valid, there should be a better message. The proposed wording on the PR (if the query is indeed invalid) ... 'Related Field got invalid lookup: salary' ... doesn't seem to provide much explanation.

comment:13 Changed 4 years ago by Can Sarıgöl

Cc: Can Sarıgöl added

Hi, actually I thought this sample is the same with below so the same exception message might be ok. Besides, I will check the query.

ingredients = RecipeIngredient.objects.aggregate(
            cost_missing = Coalesce(Sum(
                Case(
                When(ingredient__cost__isnull=False, then=Value(1)),
                default=Value(0),
                output_field=IntegerField())
            ), Value(0))
        )

comment:14 Changed 4 years ago by Simon Charette

I agree with Lukas that we should investigate why the ORM crashes here as it's a completely valid query from my perspective.

The fundamental issue here is that Subquery breaks the Expression API by having a get_source_expressions method that returns N > 0 expressions while having a set_source_expressions that accepts none.

I happen to have a local branch I've been playing with locally to make Subquery more a of dumb wrapper around sql.Query that happens to restore this contract. I haven't tried it out against this exact set of models but the fact the caller hosts a FIXME makes me feel like there might be something else going on.

comment:15 Changed 4 years ago by Can Sarıgöl

Query like this (removed user and recipes):

SELECT COALESCE(
		SUM(CASE WHEN (SELECT U0."cost" 
					   FROM "t30188_ingredientdetail" U0 
					   WHERE U0."ingredient_id" = (subquery."ingredient_id")  LIMIT 1) IS NULL THEN 1 ELSE 1 END), 1) 
FROM (SELECT "t30188_recipeingredient"."id" AS Col1, 
	  (SELECT U0."cost" 
	   FROM "t30188_ingredientdetail" U0 
	   WHERE U0."ingredient_id" = ("t30188_recipeingredient"."ingredient_id")  LIMIT 1) AS "cost_value", 
	  "U0"."ingredient_id" AS "__col1" 
	  FROM "t30188_recipeingredient" 
	  GROUP BY "t30188_recipeingredient"."id", 
	  (SELECT U0."cost" 
	   FROM "t30188_ingredientdetail" U0 
	   WHERE U0."ingredient_id" = ("t30188_recipeingredient"."ingredient_id")  LIMIT 1), 
	  "U0"."ingredient_id") subquery

the problem is "U0"."ingredient_id" AS "__col1", It should have been "t30188_recipeingredient"."ingredient_id". to solve this, I will also check your point @Simon Charette thanks.

Last edited 4 years ago by Can Sarıgöl (previous) (diff)

comment:16 Changed 4 years ago by Simon Charette

Cc: Simon Charette added

comment:17 Changed 4 years ago by Tim Graham

Patch needs improvement: set

comment:18 Changed 4 years ago by Simon Charette

The assertion error is addressed by https://github.com/django/django/pull/11062 but it unveiled another issue regarding aggregation over annotations.

I managed to get it to work by adjusting the hacky Query.rewrite_cols function but then I noticed the subquery was annotated twice which might hurt performance. I guess the suggested approach at least works and the double annotation issue seems to be generalized anyway. I suggest we open a following ticket to address the rewrite_cols extra annotation issue and proceed with the current approach for now.

comment:19 Changed 4 years ago by Can Sarıgöl

Can we fix twice annotated problem in here? I thought that we can remove the expr if it is already in annotations (expr == old_expr)

elif isinstance(expr, (Col, Subquery)) or (expr.contains_aggregate and not expr.is_summary):
 ...
 new_exprs.append(Ref(col_alias, expr))
 if isinstance(expr, Subquery):
     keys = {
         k for k, old_expr in self.annotations.items()
         if k.startswith('__col') is False and expr == old_expr
      }
     for _k in keys:
         del self.annotations[_k]


comment:20 Changed 4 years ago by Simon Charette

Can, I guess we could somehow but I feel like we'd need a more elegant solution than that.

Note that the double annotation issue doesn't only happen for subqueries, you might have noticed that the company_id which is first annotated as Col0 gets annotated as __col0 as well. This makes me believe we might need to do it for Col instances as well?

comment:21 Changed 4 years ago by Simon Charette

I just pushed another commit that addresses the double aliasing issue. I credited you in the commit message Can.

comment:22 Changed 4 years ago by Can Sarıgöl

Thanks for that. This is a better solution.

comment:23 Changed 4 years ago by Tim Graham <timograham@…>

In 35431298:

Refs #27149 -- Moved subquery expression resolving to Query.

This makes Subquery a thin wrapper over Query and makes sure it respects
the Expression source expression API by accepting the same number of
expressions as it returns. Refs #30188.

It also makes OuterRef usable in Query without Subquery wrapping. This
should allow Query's internals to more easily perform subquery push downs
during split_exclude(). Refs #21703.

comment:24 Changed 4 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: newclosed

In bdc07f1:

Fixed #30188 -- Fixed a crash when aggregating over a subquery annotation.

comment:25 Changed 4 years ago by Tim Graham <timograham@…>

In d1e9c25:

Refs #30188 -- Prevented double annotation of subquery when aggregated over.

Thanks Can Sarıgöl for the suggested trimming approach.

comment:26 Changed 4 years ago by Tim Graham <timograham@…>

In 3f32154:

Refs #30188 -- Avoided GROUP BY when aggregating over non-aggregates.

comment:27 Changed 3 years ago by Mariusz Felisiak <felisiak.mariusz@…>

In adfbf653:

Fixed #31568 -- Fixed alias reference when aggregating over multiple subqueries.

691def10a0197d83d2d108bd9043b0916d0f09b4 made all Subquery() instances
equal to each other which broke aggregation subquery pushdown which
relied on object equality to determine which alias it should select.

Subquery.eq() will be fixed in an another commit but
Query.rewrite_cols() should haved used object identity from the start.

Refs #30727, #30188.

Thanks Makina Corpus for the report.

comment:28 Changed 3 years ago by Mariusz Felisiak <felisiak.mariusz@…>

In 3913acdb:

[3.1.x] Fixed #31568 -- Fixed alias reference when aggregating over multiple subqueries.

691def10a0197d83d2d108bd9043b0916d0f09b4 made all Subquery() instances
equal to each other which broke aggregation subquery pushdown which
relied on object equality to determine which alias it should select.

Subquery.eq() will be fixed in an another commit but
Query.rewrite_cols() should haved used object identity from the start.

Refs #30727, #30188.

Thanks Makina Corpus for the report.

Backport of adfbf653dc1c1d0e0dacc4ed46602d22ba28b004 from master

comment:29 Changed 3 years ago by Mariusz Felisiak <felisiak.mariusz@…>

In 49bbf657:

[3.0.x] Fixed #31568 -- Fixed alias reference when aggregating over multiple subqueries.

691def10a0197d83d2d108bd9043b0916d0f09b4 made all Subquery() instances
equal to each other which broke aggregation subquery pushdown which
relied on object equality to determine which alias it should select.

Subquery.eq() will be fixed in an another commit but
Query.rewrite_cols() should haved used object identity from the start.

Refs #30727, #30188.

Thanks Makina Corpus for the report.

Backport of adfbf653dc1c1d0e0dacc4ed46602d22ba28b004 from master

Note: See TracTickets for help on using tickets.
Back to Top