﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
25834	Better expose ORM grouping semantics	Raphael Gaschignard	nobody	"Currently, what QuerySet's `annotate` does is entirely dependent on opaque semantics that don't hold up to well when working on more complex situations.

Let's say I have a `LineItem` model (the line items of an invoice, for example) that I want to calculate some stats on, with a `price`, and a `price_currency`. Some pointy haired boss wants to know what the totals on the line items are depending on currency. Pull out my nifty `annotate` tool:

{{{
In [33]: print LineItem.objects.values('price_currency').annotate(total=Sum('price'))
[{'price_currency': u'USD', 'total': Decimal('9001')}]
}}}

So there `annotate` ends up putting things into the `GROUP BY` clause to do the sum (namely the previous `values` call)

Sometimes I just want to rename columns to make my life easier, so I'll make a `pc` field as a shortcut to price:

{{{
In[37]: print LineItem.objects.values('price_currency').annotate(pc=F('price'))
[{'price_currency': u'USD', 'pc': Decimal('0')}, {'price_currency': u'USD', 'pc': Decimal('34')}, ...etc...]
}}}

Here `annotate` adds a `SELECT` clause without touching the `GROUP BY` clause


When I end up working on a bigger query, I'll try to simplify my queryset expression after I have something that computes what I wanted, so doing things like merging `filter`s and, sometimes `annotate`:

{{{
In[39]: print LineItem.objects.values('price_currency').annotate(total=Sum('price'), pc=F('price')).query

SELECT ""item_lineitem"".""price_currency"", SUM(""item_lineitem"".""price"") AS ""total"" FROM ""item_lineitem"" GROUP BY ""item_lineitem"".""price_currency"", ""item_lineitem"".""price"" ORDER BY ""item_lineitem"".""position"" ASC
}}}

So when I annotate with a mixture of ""grouping required"" and ""adding a select"" , then everything becomes ""grouping required"", and in the previous example, suddenly I'm grouping by currency AND price.  I think this is the only valid behaviour if you want to return something, but shouldn't this just fail?

(Some crazier stuff is when you `annotate` with Sum and it simply groups by _all_ of your model's fields. I think the reason this doesn't fail is for the motivating example in the documentation (of counting the min price of the books in a store), but this seems like something that should fail)

----
 
My understanding is that Django's ORM is coming more to terms with the fact that it's basically always going to be paired with a RDBMS, so this multi-meaning annotate being the only reasonable way to use grouping is a hard sell to me. I very much want to get rid of all of our custom SQL, but I have to spend a day working on one query, all because `annotate` ends up being 2 separate functions:
  
   - ""group by previous `values` call and add aggregation function to select""
   - ""add a select clause""

Not to mention that the pairing with `values` means that if I want to group by a complex query, I have to do something like:

{{{
Model.objects.values('foo').annotate(grouping_property=complex_thing).annotate(Sum('stuff')).annotate(grouping_property=complex_thing)
}}}

in order to actually have the `grouping_property` to show up in the select (`values` doesn't work on some of my `annotate`d values for some reason... there might be a bug that I haven't successfully isolated in my code or the ORM). 


Anyways, I'm complaining about this because I would much rather have errors show up for a lot of these cases, so that I can at least fix them, but in the current API that would be impossible...

In my ideal world, you would have `group(by=columns_or_expressions, into=aggregation_things)` which would be somewhat equivalent to ""group by the columns in `by` (something like `values`), and annotate your `select` with the `into` clause)"", so when someone asks how to do a grouping you can point to that directly. And `annotate` would be relegated to only touching the `SELECT` part of a query. With backwards compatability considerations, you could probably make a separate method just for `SELECT`.

I'm not sure of the implications with regards to joins and things like `annotate(Sum('books__price'))`, but my impression is that all this functionality is in the ORM already, just all within `annotate`



"	Cleanup/optimization	closed	Database layer (models, ORM)	1.8	Normal	invalid			Unreviewed	0	0	0	0	0	0
