For origin and some discussion see http://groups.google.com/group/django-users/browse_frm/thread/fc58fbe409a6f098/1110c688e1af58d2
I would like to see an addition to the django ORM - many people has asked for aggregation. Do you think Django should have aggregation support?
I would like to add it in a similar fashion that ValuesQuerySet is implemented (a subclass to QuerySet with minor changes to how query is constructed and results are returned) - if we add a AgreggateQuerySet that would:
- be create from simple QuerySet by a call to aggregate(group_by=(), max=(), min=(), avg=(), sum=(), count=False) , where:
- group_by is a list of fields on which to aggregate, these will be part of the result, if empty the whole query will be grouped
- If count, then COUNT(*) will be computed and included
- sum, max, min and avg will contain a list of fields on which to call SUM(), MAX(), MIN() or AVG() respectively
- will be a valid queryset
- filter() will result in limiting the result by appending conditions to the HAVING clause etc.
- aggregate() will probably add fields to individual aggregation functions.
queryset.count() would become identical to queryset.aggregate(count=True)['count']
Example:
>>> quseryset = Model.objects.all()
>>> queryset.aggregate( group_by=( 'name', 'city' ), sum=( 'pay', 'some_other_field' ), avg=( 'pay', 'age' ), count=True )
[
{
'group_by' : { 'name' : 'John Doe', 'city' : 'Prague', }
'sum' : { 'pay' : 50000, 'some_other_field' : 10000 },
'avg' : { 'pay' : 10000, 'age' : 30 },
'count' : 5,
},
{
'group_by' : { 'name' : 'Jane Doe', 'city' : 'Paris', }
'sum' : { 'pay' : 300000, 'some_other_field' : 10 },
'avg' : { 'pay' : 100000, 'age' : 32 },
'count' : 3,
},
]
There are a few problems I haven't thought through and would like comments:
- how to deal with grouping by foreign key? the best seems to include the whole object in the result, not just the ID. (isn't it too magical?)
- how to specify ordering?
- how to best present the results? originally I thought that fields in group_by will be on the same level as min or max, but Tim Chase pointed out that it could clash if the model has a field named min .
I will start working on this as soon as I am sure about the interface. Any comments and help would be greatly appreciated.