Opened 18 months ago

Last modified 17 months ago

#26602 new New feature

Provide a way to manage grouping with RawSQL

Reported by: Jamie Cockburn Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: QuerySet.extra
Cc: josh.smeaton@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Tobin Brown)

I wanted to annotate my objects with a running total:

class A(models.Model):
    amount = models.IntegerField()
    created = models.DateTimeField(auto_now_add=True)

qs = A.objects.annotate(total=RawSQL("SUM(amount) OVER (ORDER BY created)", []))

That works fine, and I get a running total for each object, but I cannot call count() on that queryset:

>>> qs.count()
Traceback...
ProgrammingError: window functions not allowed in GROUP BY clause

Using extra(), I can get the same annotation behaviour as well being able to call count():

>>> qs = A.objects.extra(select={'total': 'SUM(amount) OVER (ORDER BY created)'})
>>> qs.count()
8

Change History (8)

comment:1 Changed 18 months ago by Tobin Brown

Description: modified (diff)

comment:2 Changed 18 months ago by Josh Smeaton

Cc: josh.smeaton@… added
Triage Stage: UnreviewedAccepted
Version: 1.9master

This happens because RawSQL defines:

def get_group_by_cols(self):
        return [self]

It'd be helpful if there was a way to disable grouping in an easy way. For the moment, you should be able to do something like:

window = RawSQL("SUM(amount) OVER (ORDER BY created)", [])
window.get_group_by_cols = lambda: []
qs = A.objects.annotate(total=window)
qs.count()

# Or..

RawAggregate(RawSQL):
    contains_aggregate = True  # may not be necessary in newer django versions
    def get_group_by_cols(self):
        return []

qs = A.objects.annotate(total=RawAggregate("SUM(amount) OVER (ORDER BY created)", []))

Can you let me know if this workaround helps you?

In the meantime, I think it's worth figuring out how we want to treat RawSQL when it should not be doing group bys, so I'm accepting on that basis.

Two options I see. First, we can define a whole new class as I've done above - like RawAggregate. Second, we could use a kwarg to RawSQL to tweak whether or not it contains aggregate code.

RawSQL('SELECT SUM() ..', [], grouping=False)

comment:3 Changed 17 months ago by Jamie Cockburn

Yes, you're suggestions did work as solutions, thank you.

Inspired by your RawAggregate subclass, I decided I'd have a go at writing a proper window function Expression. I was mostly guessing at how the API is supposed to work, so any hints/improvements would be welcome.

class Window(Expression):
    template = '%(expression)s OVER (%(window)s)'

    def __init__(self, expression, partition_by=None, order_by=None, output_field=None):
        self.order_by = order_by
        if isinstance(order_by, six.string_types):
            if order_by.startswith('-'):
                self.order_by = OrderBy(F(self.order_by[1:]), descending=True)
            else:
                self.order_by = OrderBy(F(self.order_by))

        self.partition_by = partition_by
        if self.partition_by:
            self.partition_by = self._parse_expressions(partition_by)[0]

        super(Window, self).__init__(output_field=output_field)
        self.source_expression = self._parse_expressions(expression)[0]
        if not self.source_expression.contains_aggregate:
            raise FieldError("Window function expressions must be aggregate functions")

    def _resolve_output_field(self):
        if self._output_field is None:
            self._output_field = self.source_expression.output_field

    def resolve_expression(self, query=None, allow_joins=True, reuse=None, summarize=False, for_save=False):
        c = self.copy()
        c.source_expression = c.source_expression.resolve_expression(query, allow_joins, reuse, summarize, for_save)
        if c.partition_by:
            c.partition_by = c.partition_by.resolve_expression(query, allow_joins, reuse, summarize, for_save)
        if c.order_by:
            c.order_by = c.order_by.resolve_expression(query, allow_joins, reuse, summarize, for_save)
        c.is_summary = summarize
        c.for_save = for_save
        return c

    def as_sql(self, compiler, connection, function=None, template=None):
        connection.ops.check_expression_support(self)
        expr_sql, params = compiler.compile(self.source_expression)

        window_sql = []
        if self.partition_by:
            window_sql.append('PARTITION BY ')
            order_sql, order_params = compiler.compile(self.partition_by)
            window_sql.append(order_sql)
            params.extend(order_params)
        if self.order_by:
            window_sql.append(' ORDER BY ')
            order_sql, order_params = compiler.compile(self.order_by)
            window_sql.append(order_sql)
            params.extend(order_params)
        template = template or self.template
        return template % {'expression': expr_sql, 'window': "".join(window_sql)}, params

    def copy(self):
        copy = super(Window, self).copy()
        copy.source_expression = self.source_expression.copy()
        copy.partition_by = self.partition_by.copy() if self.partition_by else None
        copy.order_by = self.order_by.copy() if self.order_by else None
        return copy

    def get_group_by_cols(self):
        return []

Which means you can write things like this:

A.objects.annotate(total=Window(Sum('amount'), order_by='created'))

comment:4 Changed 17 months ago by Anssi Kääriäinen

I guess we could add Window annotations. The problem is that you can't actually filter on window functions, you need a subquery for that. That is:

select SUM(amount) OVER (ORDER BY created) as sum_amount
  from ...
 where sum_amount > 0

is illegal SQL. We'd need to write the query as:

select * from (select SUM(amount) OVER (ORDER BY created) as sum_amount from ...) subq
 where sum_amount > 0

I don't think we have machinery to do that easily right now.

I'm OK adding window aggregates to Django if they fail loudly and clearly when one tries to filter over them. They could be useful even if you can't actually filter on the result.

Maybe a new ticket is better for this than hijacking this ticket?

comment:5 Changed 17 months ago by Jamie Cockburn

I agree a new ticket is appropriate, as this has moved beyond my original problem.

Do you still intend to add the RawAggregate or RawSQL(..., grouping=False) solutions for this ticket?

Should I create the new ticket or is that your job?

comment:6 Changed 17 months ago by Josh Smeaton

Window functions were always something I intended to look into, so it's really cool to see that working!

This ticket will focus on providing a way to manage grouping on RawSQL. If you'd like to create a ticket to track actual Window expressions, that'd be awesome. Anyone can create new feature tickets so feel free to.

comment:7 Changed 17 months ago by Jamie Cockburn

#26608 created

comment:8 Changed 17 months ago by Tim Graham

Summary: RawSQL window functions break count()Provide a way to manage grouping with RawSQL
Type: UncategorizedNew feature
Note: See TracTickets for help on using tickets.
Back to Top