Opened 7 years ago

Closed 7 years ago

#21524 closed Uncategorized (wontfix)

Annotation and .extra with group function breaks at database level

Reported by: eu@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords: oracle
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


This is somewhat related to #14657 and #13274

For some awkward reason I have the following queryset:

qs = MyModel.objects.all()
qs = qs.annotate(
    #opened_percent=Avg('statstable__opened_percent'), # The average of the percents is not the correct ratio
    #closed_percent=Avg('statstable__closed_percent'), # value between closed and total
qs = qs.extra(
        'closed_percent': 'SUM(statstable.closed)/SUM(*100',
        'opened_percent': 'SUM(statstable.opend)/SUM(*100',

When you try to do this on Oracle, it will put 'closed_percent' and 'opened_percent' on GROUP BY list, which will raise a DatabaseError: ORA-00934: group function is not allowed here.

Since we are not supposed to mess with ".group_by" without Django's inner code, there is no way to solve this problem.

The quick and dirty solution for me right now was to edit django/db/models/sql/ at line 572 and add the following code:

                if isinstance(col, (list, tuple)):
                    result.append('%s.%s' % (qn(col[0]), qn(col[1])))
                elif hasattr(col, 'as_sql'):
                    result.append(col.as_sql(qn, self.connection))
                    if 'avg(' in col.lower() or 'sum(' in col.lower(): 
                        # This is what I added, just checking if the extra select/column has the  
                        # group functions that I need, if it does, skip it, since you 
                        # can't GROUP BY group functions
                    result.append('(%s)' % str(col))

I'm using Django 1.4 on this project (with no ETA to upgrade, unfortunally), and I see that on HEAD it has changed a bit, but remains "Unconditionally adding the extra_select items", which is the root of my problems:

Change History (5)

comment:1 Changed 7 years ago by eu@…

This "validation" also must happens at line 114:

                if not self.connection.features.allows_group_by_pk:
                    for col, col_params in ordering_group_by:
                        if col not in grouping:
                            if 'avg(' in col.lower() or 'sum(' in col.lower():

comment:2 Changed 7 years ago by Aymeric Augustin

You're clearly pushing the boundaries of what the ORM can do... Writing the query in plain SQL (look for raw SQL in the documentation) could be easier at this point — and a short term solution.

I'll let someone who knows the aggregation code better than me make a decision on this ticket.

comment:3 Changed 7 years ago by Shai Berger

The solution suggested is indeed dirty -- and bogus (I can think up valid queries it breaks, the most trivial of which adds constant 'sum()' in the extra columns).

In a similar case, I did something like

def annotate_percents(qset):
    class AnnotatedQuerySet(qset.__class__):
        def iterator(self):
           for obj in super(AnnotatedQuerySet, self).iterator():
               obj.closed_percent = obj.closed/ * 100
               obj.opened_percent = obj.opened/ * 100
               yield obj
   qset.__class__ = AnnotatedQuerySet
   return qset

Now, in the case you have with AVG you may really need to do things in the database; you can still do this efficiently by judicious use of prefetch_related.
It can allow you to do things that look like you add a new related query per object returned, but really only cost you one more database access per the whole queryset.

I hope this helps; note that even if something like your use-case ends up supported, it will probably not be in Django 1.4.x.

comment:4 Changed 7 years ago by eu@…

@Shai, thanks for your help, but this query that I do is used in 2~3 tables with almost same columns (they are statistical tables), and I need to group data by ranges of dates and then order the result by the highest and lowest percentage, hence my need for all the math being done by the database. I don't think getting all the data and then math and sort on the python would be a good solution for now.

@aaugustin, I do indeed am pushing what the ORM can do, and will totally accept this ticket be closed as "invalid" or "wontfix", even if I'm just using two queryset methods: .extra and .annotate.

comment:5 Changed 7 years ago by Anssi Kääriäinen

Resolution: wontfix
Status: newclosed

I don't think we can support this. It is impossible to detect which parts of the SQL in extra select needs to be in group by. Trying to parse the SQL doesn't work. Consider for example:

account_balance - SUM(transaction_amount)

Here account_balance must be in group by, but transaction_amount not.

The .extra() as it is currently implemented in Django just doesn't work for your use case. I hope we will have something better later on, but for now I'll have to just wontfix this issue.

Note: See TracTickets for help on using tickets.
Back to Top