Opened 16 years ago

Closed 16 years ago

#6811 closed (duplicate)

Add functions, expressions and grouping to QuerySet.values()

Reported by: Justin Owned by: Justin
Component: Database layer (models, ORM) Version: queryset-refactor
Severity: Keywords: aggregation grouping
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This proposal is for adding aggregation support to QuerySet.values().

An E object is introduced that represents a query expression, along with several subclasses:

  • F: a field/column
  • Function: an SQL function
  • Aggregate: an SQL aggregation function

The basic syntax for values remains unchanged, field names or E objects are passed as position arguments. Grouping is implicit based on the types of arguments to values(). Non-aggregate fields and expressions are used in the GROUP BY clause. If no non-aggregate fields are passed, values() returns a single result. Functions objects have default names in the output based on the function name and field.

Examples:

Sum field1:

>>> Model.objects.values(Sum('field1'))
[ {'field1_sum': 1000} ]

Group by field1, sum field2:

>>> Model.objects.values('field1', Sum('field2'))
[ {'field1':10, 'field2_sum':1000 },
  {'field1':20, 'field2_sum':1234 } ]

Complex expression with a specified output alias:

>>> Model.objects.values('field1', E(Sum('field2') + Sum('field3'), alias='my_sum'))
[ {'field1':10, 'my_sum':1000 },
  {'field1':20, 'my_sum':1234 } ]

Use of F objects:

>>> Invoice.objects.values('id', E(F('subtotal') + F('tax'), alias='total'))
[ {'id':1, 'total':1000.00 },
  {'id':2, 'total':1234.00 } ]

Note that no grouping happens here since there are no aggregate functions.

Output aliases could possibly be specified via kwargs:

>>> Model.objects.values('field1', my_sum=Sum('field2') + Sum('field3'))
[ {'field1':10, 'my_sum':1000 },
  {'field1':20, 'my_sum':1234 } ]

It would be nice to support grouping by functions as well:

>>> Sale.objects.values(Month('date'), Sum('payment'))
[ {'date_month':1, 'payment_sum':1000.00 },
  {'date_month':2, 'payment_sum':1234.00 } ]

In this case Month is non-aggregating, so it's included in GROUP BY. Also, backends will have to be extended so that non-standard functions work.

User defined function using built-in functions:

>>> def my_func(value_col, quantity_col):
>>>    return Sum(value_col) / Sum(quantity_col)
>>> Product.objects.values('type', avg_price=my_func('price','quantity'))
[ {'type':'widget', 'avg_price':1.00 },
  {'type':'gadget', 'avg_price':2.00 } ]

Aggregation should be supported across relationships:

>>> Customer.objects.values('name', Sum('purchases__payment'))
[ {'name':'John', 'purchases__payment_sum':1000.00 },
  {'name':'Jane', 'purchases__payment_sum':1234.00 } ]

I think this proposal adds minimal new syntax and fits nicely with existing features like Q objects. It's possible that changing the behavior of value() to trigger grouping could be confusing.

This proposal so far only covers adding aggregation to QuerySet.values(). Another function (such as annotate() suggested by Russell) could be added to support aggregation with Models. It might be good to eventually support E objects in QuerySet.filter(). It might also be possible to use the E objects in Models via some sort of ExpressionField (but that's getting far ahead of myself).

Some of this already works in a haskish, fragile way (auto-grouping and simple aggregation functions, but not expressions), but I'm re-working my changes as per Malcolm's suggestions. I'll post a very preliminary patch for review when I get some of his suggestions finished. I'm not married to this proposal, and enjoy the work for itself, so if people think we should a different direction, that's fine.

So, thoughts, suggestions, test cases?

Most recent discussion:
http://groups.google.com/group/django-developers/browse_thread/thread/3d4039161f6d2f5e

Past discussions:
http://groups.google.com/group/django-developers/browse_thread/thread/f433edf7b0ebdbcb/9758e11744381e67[#5420]
http://groups.google.com/group/django-developers/browse_thread/thread/e53a62e1c73871b8/2f91fa217bc465bb?#2f91fa217bc465bb

Another proposal:
http://code.djangoproject.com/ticket/3566

Attachments (4)

qs_value_expressions.diff (4.3 KB ) - added by Justin 16 years ago.
Preliminary patch. Not well tested.
functions.py (6.8 KB ) - added by Justin 16 years ago.
Expression/Function classes
qs_value_expressions.2.diff (4.8 KB ) - added by Justin 16 years ago.
fixed a auto-grouping bug in first patch. added keyword alias support
functions.2.py (6.8 KB ) - added by Justin 16 years ago.
fixed auto-grouping bug

Download all attachments as: .zip

Change History (7)

comment:1 by Justin, 16 years ago

Owner: changed from nobody to Justin
Status: newassigned

by Justin, 16 years ago

Attachment: qs_value_expressions.diff added

Preliminary patch. Not well tested.

by Justin, 16 years ago

Attachment: functions.py added

Expression/Function classes

comment:2 by Justin, 16 years ago

Has patch: set
Needs tests: set

I added an initial patch and expression classes.

Table and column names are remapped properly, so you can do aggregation across relations as in the last example. Specifying output aliases as keyword arguments is not yet supported, and general expressions and operators don't have a default output alias, so expressions must be wrapped in an E object with an alias.

by Justin, 16 years ago

Attachment: qs_value_expressions.2.diff added

fixed a auto-grouping bug in first patch. added keyword alias support

by Justin, 16 years ago

Attachment: functions.2.py added

fixed auto-grouping bug

comment:3 by Malcolm Tredinnick, 16 years ago

Resolution: duplicate
Status: assignedclosed

As with #6821, this is a duplicate of #3566, which is the original ticket for aggregate work.

Patches should go there after discussion on the mailing list (it's not at all clear that consensus on the approach or API has been reached yet).

Note: See TracTickets for help on using tickets.
Back to Top