Django

Code

Ticket #3566 (assigned)

Opened 1 year ago

Last modified 3 weeks ago

Proposal: ORM aggregation support

Reported by: Honza Král <Honza.Kral@gmail.com> Assigned to: russellm (accepted)
Component: Database wrapper Version: SVN
Keywords: orm aggregation group by Cc: nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com, remco@diji.biz, martin.paquette@gmail.com, nicolaslara@gmail.com
Triage Stage: Someday/Maybe Has patch: 1
Needs documentation: 0 Needs tests: 0
Patch needs improvement: 1

Description

For origin and some discussion see http://groups.google.com/group/django-users/browse_frm/thread/fc58fbe409a6f098/1110c688e1af58d2

I would like to see an addition to the django ORM - many people has asked for aggregation. Do you think Django should have aggregation support?

I would like to add it in a similar fashion that ValuesQuerySet is implemented (a subclass to QuerySet with minor changes to how query is constructed and results are returned) - if we add a AgreggateQuerySet that would:

  • be create from simple QuerySet by a call to aggregate(group_by=(), max=(), min=(), avg=(), sum=(), count=False) , where:
    • group_by is a list of fields on which to aggregate, these will be part of the result, if empty the whole query will be grouped
    • If count, then COUNT(*) will be computed and included
    • sum, max, min and avg will contain a list of fields on which to call SUM(), MAX(), MIN() or AVG() respectively
  • will be a valid queryset
    • filter() will result in limiting the result by appending conditions to the HAVING clause etc.
    • aggregate() will probably add fields to individual aggregation functions.

queryset.count() would become identical to queryset.aggregate(count=True)['count']

Example:

>>> quseryset = Model.objects.all()
>>> queryset.aggregate( group_by=( 'name', 'city' ), sum=( 'pay', 'some_other_field' ), avg=( 'pay', 'age' ), count=True )
[
 {
   'group_by' : { 'name' : 'John Doe', 'city' : 'Prague', }
   'sum' : { 'pay' : 50000, 'some_other_field' : 10000 },
   'avg' : { 'pay' : 10000, 'age' : 30 },
   'count' : 5,
 },
 {
   'group_by' : { 'name' : 'Jane Doe', 'city' : 'Paris', }
   'sum' : { 'pay' : 300000, 'some_other_field' : 10 },
   'avg' : { 'pay' : 100000, 'age' : 32 },
   'count' : 3,
 },
]

There are a few problems I haven't thought through and would like comments:

  • how to deal with grouping by foreign key? the best seems to include the whole object in the result, not just the ID. (isn't it too magical?)
  • how to specify ordering?
  • how to best present the results? originally I thought that fields in group_by will be on the same level as min or max, but Tim Chase pointed out that it could clash if the model has a field named min .

I will start working on this as soon as I am sure about the interface. Any comments and help would be greatly appreciated.

Attachments

aggregate.diff (4.9 kB) - added by Nicolas Lara <nicolaslara@gmail.com> on 03/22/08 08:19:00.
Patch that adds aggregate functionality (no annotate yet)
aggregate.2.diff (8.6 kB) - added by nicolas on 04/19/08 07:04:24.

Change History

02/24/07 15:07:09 changed by django.ticket@tim.thechases.com

  • needs_better_patch changed.
  • needs_tests changed.
  • needs_docs changed.

I would add that the resulting structure should be accessible within a template. Thus, one should be able to have in one's view

context = {

'stats': model.objects.aggregate(group_by=('field0',), sum=('field1', 'field2')),
'items': model.objects.all(),

}
return render_to_response('template.html', context)

one should be able to refer to the results in a template as

{{ stats.sum.field1 }}
{{ stats.group_by.field0 }}

just as one can refer to

{% for item in items %}
{{ item.field3 }}
{% endif %}

-tkc

02/24/07 23:45:04 changed by jacob

  • summary changed from Proposal of ORM extension -- aggregationu to Proposal: ORM aggregation support.

FWIW: I really like this proposal.

02/25/07 14:08:11 changed by Marc Fargas <telenieko@telenieko.com>

  • stage changed from Unreviewed to Design decision needed.

03/05/07 19:58:08 changed by nick.lane.au@gmail.com

  • cc set to nick.lane.au@gmail.com.

03/28/07 19:38:53 changed by anonymous

  • cc changed from nick.lane.au@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com.

(follow-up: ↓ 7 ) 04/12/07 08:08:23 changed by Ciantic

"how to deal with grouping by foreign key? the best seems to include the whole object in the result, not just the ID. (isn't it too magical?)" Surely, aren't they lazy after all? So they will be not queried until you access them?

"how to specify ordering?" How about queryset.aggregate( group_by=( 'name', 'city' ), sum=( 'pay', 'some_other_field' ), avg=( 'pay', 'age' ), count=True ).order_by(group_by='name', sum='-pay', avg='pay') (- sign represents the reversed order as it represents it in normal querysets too)

"how to best present the results? originally I thought that fields in group_by will be on the same level as min or max, but Tim Chase pointed out that it could clash if the model has a field named min ." I see no collision in current result.

(in reply to: ↑ 6 ) 04/12/07 19:39:10 changed by Honza Král <Honza.Kral@gmail.com>

Replying to Ciantic:

"how to deal with grouping by foreign key? the best seems to include the whole object in the result, not just the ID. (isn't it too magical?)" Surely, aren't they lazy after all? So they will be not queried until you access them?

either it will be the whole object, or just an id, having a lazy object doesn't seem to have any effect here

"how to specify ordering?" How about queryset.aggregate( group_by=( 'name', 'city' ), sum=( 'pay', 'some_other_field' ), avg=( 'pay', 'age' ), count=True ).order_by(group_by='name', sum='-pay', avg='pay') (- sign represents the reversed order as it represents it in normal querysets too)

nice, but what field should be first in your example? name, SUM( pay ) or AVG( pay ) ??

I am sorry, I didn't have the time to advance with this, plus the queryset refactoring came around and that really needs to be done first.

(follow-up: ↓ 9 ) 04/13/07 09:27:58 changed by Ciantic

either it will be the whole object, or just an id, having a lazy object doesn't seem to have any effect here

Returning just an ID is more harmful than returning object. When you return just the ID the people who needs the object is facing frustrating problem, fetching those objects externally. I suggest we use object, and those who need ID just uses the object.id to get it (notice: in django one shouldn't change the id fieldname, I've learnt, same way they shouldn't change the primarykey type... even if it is possible, there is stuff that simply does not work after changing it).

nice, but what field should be first in your example? name, SUM( pay ) or AVG( pay ) ??

name is first, then comes sub(pay) with descending in place and finally avg(pay), same way as in normal order_by, the leftmost has biggest priority.

(in reply to: ↑ 8 ) 04/13/07 15:25:27 changed by Honza Král <Honza.Kral@gmail.com>

Replying to Ciantic:

{{{ nice, but what field should be first in your example? name, SUM( pay ) or AVG( pay ) ?? }}} name is first, then comes sub(pay) with descending in place and finally avg(pay), same way as in normal order_by, the leftmost has biggest priority.

and what if I want to sort on sum( pay ), name, sum( other_field ), that wouldn't work in your syntax, since order_by( sum='pay', group_by='name', sum='other_field' ) is a syntax error....

I would like to avoid things like order_by( 'sum(pay)', 'avg(other_field)', 'name' ), but so far its the only way I can think of...

04/16/07 00:03:46 changed by russellm

Earlier counterproposal here. Current discussion of these proposals here

05/08/07 05:31:53 changed by anonymous

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com.

06/02/07 22:46:16 changed by Brian Harring <ferringb@gmail.com>

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com.

06/04/07 07:36:39 changed by anonymous

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk.

06/18/07 21:45:17 changed by anonymous

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com.

07/07/07 22:56:30 changed by anonymous

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk.

07/25/07 11:11:01 changed by Alexander Solovyov <alexander.solovyov@gmail.com>

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com.

07/30/07 06:19:50 changed by russellm

  • owner changed from adrian to russellm.

08/17/07 11:02:20 changed by anonymous

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com.

09/01/07 11:59:58 changed by Densetsu no Ero-sennin <densetsu.no.ero.sennin@gmail.com>

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com.

09/09/07 19:18:15 changed by russellm

  • owner changed from nobody to russellm.

For the record - I'm taking ownership of this ticket, although work probably won't commence until after the queryset refactor and newforms-admin land in trunk. There are still some design issues to sort out, and I want to make sure we have everyone's attention.

09/09/07 19:18:26 changed by russellm

  • status changed from new to assigned.

09/21/07 02:43:34 changed by arthur.case@gmail.com

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com.

09/28/07 04:13:48 changed by anonymous

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com, remco@diji.biz.

12/01/07 17:18:57 changed by jacob

  • stage changed from Design decision needed to Someday/Maybe.

02/04/08 14:55:55 changed by anonymous

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com, remco@diji.biz to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com, remco@diji.biz, martin.paquette@gmail.com.

03/22/08 08:19:00 changed by Nicolas Lara <nicolaslara@gmail.com>

  • attachment aggregate.diff added.

Patch that adds aggregate functionality (no annotate yet)

03/22/08 08:26:30 changed by Nicolas Lara <nicolaslara@gmail.com>

  • cc changed from nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com, remco@diji.biz, martin.paquette@gmail.com to nick.lane.au@gmail.com, jm.bugtracking@gmail.com, dan.fairs@gmail.com, ferringb@gmail.com, lau@iola.dk, wmealing@gmail.com, frankie@grimboy.co.uk, alexander.solovyov@gmail.com, lurker86@gmail.com, densetsu.no.ero.sennin@gmail.com, arthur.case@gmail.com, remco@diji.biz, martin.paquette@gmail.com, nicolaslara@gmail.com.
  • needs_better_patch set to 1.
  • has_patch set to 1.

I added a patch that provides the aggregate functionality as proposed in http://groups.google.com/group/django-developers/browse_thread/thread/3d4039161f6d2f5e/7485e59f449f9bf4?hl=en& No anotate function yet. This is not a final implementation, just some of the functionality. The aggregate modifier already follows joins and group by functionality can be specified by using values on the QuerySet? before calling aggregate(). Aggregate is a terminal modifier. To-Do: test aggregate(), annotate(), AggregationQueryset?? (like ValuesQuerySet?, so it doesn't need to be called as the last modifier), un-aliased syntax, custom

04/19/08 07:04:24 changed by nicolas

  • attachment aggregate.2.diff added.

04/19/08 07:13:06 changed by nicolas

Finnally got some time to work on this. I added a new update to the patch. It is still very rough but implements both annotate and aggregate functionalities. Currently the modifiers take both *agrs and **kwargs and generate the standard alias in the case of *args. When applying the modifiers to a ValuesQuerySet? the field used for GROUP BY are restricted to the fields specified in 'values'. Annotate returns a new (Values)QuerySet? and aggregate a dict. Currently there is a 'bug' in which I'd like to get feedback: when aggregating on both a field of the model and a foreign field the result of the aggregation on the 'local' field is 'wrong' (not the same as if aggregating only in local fields or only in foreign fields) this is because we are doing the aggregation in columns that are repeated in the SQL (due to the nature of joins). I am not sure if this should be prohibited (doing both foreign and local aggregates) because, though it provides some functionality, generates results that are unexpected and might lead to confusion. I am not going to touch on this for now as I think it needs discusion. I will continue to work in making the HAVING clause work when filtering on aggregated fields.

04/23/08 06:40:37 changed by russellm

For the benefit of those that haven't heard - ORM aggregation support has been accepted as a 2008 Google Summer of Code project. Nicolas Lara will be doing the heavy lifting, I (Russell Keith-Magee) will be mentoring.


Add/Change #3566 (Proposal: ORM aggregation support)




Change Properties
Action