Opened 6 years ago

Last modified 5 weeks ago

#10929 new New feature

Support a default value for Sum (and possibly other aggregation functions)

Reported by: nolan Owned by:
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: aggregate annotate default
Cc: real.human@…, net147, cvrebert, jpk Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

By default, annotate(sum_field = Sum(...)) results in sum_field being NULL if there were no values to sum. In most cases, 0 would be a better option here, for proper sorting in a later order_by, and for comparisons using lt/gt/gte/lte.

A monkeypatch to implement default values for NULL using COALESCE is available here:
http://stackoverflow.com/questions/553038/treat-null-as-0-in-django-model

Attachments (1)

10929-aggregates-default-r17029.diff (2.2 KB) - added by mrmachine 4 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 6 years ago by russellm

  • Component changed from Database layer (models, ORM) to ORM aggregation
  • Needs documentation unset
  • Needs tests unset
  • Owner nobody deleted
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

This is a reasonable suggestion; variants in SQL syntax notwithstanding, it shouldn't be too hard to implement.

For those following the advice of Stack Overflow: There is no need to monkeypatch the sql_aggregates module - if you override the add_to_query function(), you can load the sql-specific aggregate from wherever you want.

In fact, you don't even need to have the split between generic and SQL specific aggregate. If all you want is a custom SQL aggregate that you can use in your own code, the following definition will help:

from django.db.models.sql.aggregates import Aggregate 
class MyAggregate(Aggregate): 
    sql_function = 'SOME_SQL_FUNC' 
    sql_template = 'SQL_TEMPLATE_BITS(%(function)s(%(field)s), %(default)s)'

    def __init__(self, lookup, **extra): 
        self.lookup = lookup 
        self.extra = extra 
    def _default_alias(self): 
        return '%s__%s' % (self.lookup, self.__class__.__name__.lower()) 
    default_alias = property(_default_alias) 
    def add_to_query(self, query, alias, col, source, is_summary): 
        super(MyAggregate, self).__init__(col, source, is_summary, **self.extra) 
        query.aggregate_select[alias] = self 

comment:2 Changed 4 years ago by lukeplant

  • Severity set to Normal
  • Type set to New feature

comment:3 Changed 4 years ago by ramiro

  • Easy pickings unset

#14548 was a duplicate, had some useful discussion in its comments.

Changed 4 years ago by mrmachine

comment:4 Changed 4 years ago by mrmachine

  • Cc real.human@… added
  • Has patch set
  • Keywords aggregate annotate default added
  • Needs documentation set
  • UI/UX unset
  • Version changed from 1.1-beta-1 to SVN

Just added a patch with an initial implementation for this feature. It should work for all aggregates by wrapping the default sql_template in COALESCE(%s, %%(default)s), if params has a default item. There's a basic test in there that checks the behaviour for Avg with and without a default. If there's positive feedback about the implementation, I'll have a go at adding documentation and any improvements to the implementation and tests that are suggested.

comment:5 Changed 3 years ago by net147

  • Cc net147 added

comment:6 Changed 3 years ago by lukeplant

There is a good case for giving Sum a default of zero, while leaving Max and Min with default of None. Essentially, the reason is that zero is the fixed point of the + operator, while max and min have no fixed points (excluding negative infinity and positive infinity respectively, which are problematic). Also, it makes the behaviour analogous to the Python builtins sum, min and max - sum returns 0 for an empty list, whereas min and max throw exceptions. (We don't want to throw exceptions here, for various obvious reasons, but I think we should be indicating 'undefined' in some way under the same circumstances).

If we do this, we need a note in the release notes about the backwards incompatibility - if people were relying on Sum returning None for no data instead of zero. (We can certainly classify the previous behaviour as a bug, since it was out of line with our docs, and neither expected or useful behaviour, but should still mention this). We should also have some tests that ensure that it returns a zero of the right type for different underlying field types.

This also brings up the possibility of whether the default for Sum should act like the start parameter of the sum Python builtin. I think it should. That would make implementation harder, though, I guess. Alternatively we could have a separate 'start' parameter for Sum, which might be clearer, and would make that a separate feature.

comment:7 Changed 3 years ago by cvrebert

  • Cc cvrebert added

comment:8 Changed 2 years ago by jpk

  • Cc jpk added

comment:9 Changed 2 years ago by akaariai

  • Component changed from ORM aggregation to Database layer (models, ORM)

comment:10 Changed 6 months ago by jarshwah

I would support closing this as a duplicate of #23753. Once the linked ticket is implemented, it'll be possible for users to construct their own coalesce value like:

annotate(sum_field = Coalesce(Sum(...), 0))

This leaves the default behaviour alone, but provides a higher level of customisation. It relies on the #14030 patch which will also allow adding a "start" value, as mentioned by @lukeplant above:

annotate(sum_field = Coalesce(Sum(...), 0) + Value(5))

Would everyone be happy enough with the above use?

comment:11 Changed 6 months ago by mrmachine

I'm not totally convinced (yet) that we should give up on this when Coalesce() becomes available. If we just instruct people to use Coalesce(Sum(...), default) all the time, we're basically just recreating the literal semantics of SQL itself rather than providing an easy to use abstraction of it. Sure, it would be useful and powerful to have Coalesce() as well, but we might not need to force people to use it all the time for the common case. At the very least, a decision on closing this ticket should probably wait until Coalesce() is actually implemented.

comment:12 follow-ups: Changed 5 months ago by jarshwah

@mrmachine that's not a bad argument to make. What we could support is:

Sum('field', default=0)

Which would then be returned from convert_value in the case where the value is None. The argument against that is we'd be special-casing the Sum aggregate, while leaving other aggregates to use the Coalesce method. That's not such a big downside though.

Does anyone else want to weigh in on this? Supporting a default=0 kwarg is a 4 line + docs change.

comment:13 in reply to: ↑ 12 Changed 4 months ago by devunt

Replying to jarshwah:

@mrmachine that's not a bad argument to make. What we could support is:

Sum('field', default=0)

Which would then be returned from convert_value in the case where the value is None. The argument against that is we'd be special-casing the Sum aggregate, while leaving other aggregates to use the Coalesce method. That's not such a big downside though.

Does anyone else want to weigh in on this? Supporting a default=0 kwarg is a 4 line + docs change.

+1 for this.

comment:14 in reply to: ↑ 12 Changed 5 weeks ago by LaundroMat

Replying to jarshwah:

@mrmachine that's not a bad argument to make. What we could support is:

Sum('field', default=0)

Which would then be returned from convert_value in the case where the value is None. The argument against that is we'd be special-casing the Sum aggregate, while leaving other aggregates to use the Coalesce method. That's not such a big downside though.

Does anyone else want to weigh in on this? Supporting a default=0 kwarg is a 4 line + docs change.

+1 from me too.

comment:15 in reply to: ↑ 12 Changed 5 weeks ago by digismack

Replying to jarshwah:

@mrmachine that's not a bad argument to make. What we could support is:

Sum('field', default=0)

Which would then be returned from convert_value in the case where the value is None. The argument against that is we'd be special-casing the Sum aggregate, while leaving other aggregates to use the Coalesce method. That's not such a big downside though.

Does anyone else want to weigh in on this? Supporting a default=0 kwarg is a 4 line + docs change.

I ran into a situation today where this feature would have been nice. +1 from me.

comment:16 Changed 5 weeks ago by jarshwah

If you are running into this use case, you can use Coalesce(Sum('field'), 0) in Django 1.8. At least until a decision is made on this ticket. If you want to bring this up on the ML for the other ORM people to have input, that'd be a good next step.

Note: See TracTickets for help on using tickets.
Back to Top