Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#25339 closed New feature (worksforme)

Aggregation and annotation by time period and intervals (by month, week, day, hour, etc)

Reported by: Austin Pua Owned by: nobody
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords: QuerySet.extra
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Austin Pua)

I often found that aggregation of data by dynamically generating time periods given a desired time interval is a very sought after feature. I personally created the snippet below to accommodate my needs, but as per the note here, the extra() method will be deprecated soon. I think it will be a waste to deprecate this API, since there is simply a lot of DBMS-specific SQL functions available, but if there is any way we can do this via Django ORM, then I am still up for it. The snippet below shows a custom QuerySet method I often use.

def get_metrics(self, frequency):
    select_fields = OrderedDict()
    if frequency == 'week':
        select_fields['time_period'] = "date_trunc(%s, initial_timestamp::TIMESTAMP WITH TIME ZONE AT TIME ZONE %s + '1 day'::interval) - '1 day'::interval"
    else:
        select_fields['time_period'] = "date_trunc(%s, initial_timestamp::TIMESTAMP WITH TIME ZONE AT TIME ZONE %s)"
    select_params = (frequency, settings.TIME_ZONE,)
    queryset = self.extra(
        select=select_fields,
        select_params=select_params
    ).values('time_period', ....)
    queryset = queryset.annotate(
        # Add annotations
    )
    return queryset

Change History (5)

comment:1 by Austin Pua, 9 years ago

Description: modified (diff)

comment:2 by Josh Smeaton, 9 years ago

Resolution: worksforme
Status: newclosed

We began talking about deprecating .extra() because now we have user defined expressions that can replace (the majority of) .extra. See Func Expressions specifically.

Django 1.9 will also include Day/Month/Year/Week transformations, so you'll be able to do something similar to:

from django.db.models.lookups import MonthTransform

Sale.objects.filter(
    sale_date__year=2015
).annotate(
    month=MonthTransform('sale_date')
).values('month').aggregate('id')

If you find there's something you cannot do with expressions that you can currently do with extra (with regards to SELECT), please come talk to us in #django or post to django-users. We are already aware that extra is still the only way to introduce custom joins.

in reply to:  2 comment:3 by Austin Pua, 9 years ago

I do not think that excerpt does the same functionality mine does. It seems that it only filters for the year 2015, extracts the month number, and then performs a 'GROUP BY' based on the month number. It does not generate an actual timeseries that can easily be parsed by javascript and charting libraries.

My excerpt outputs actual time periods based on a range of dates with specific intervals/frequency. If for example, the start_date is 2015-01-01, end_date is 2015-07-07, and frequency is month, it will aggregate the results that match the date range and group them by month dates like so: 2015-01-01, 2015-02-01, 2015-03-01, ... 2015-07-01.

If the frequency is hour, then it will output each and every hour like so: 2015-01-01 00:00:00, 2015-01-01 01:00:00, 2015-01-01 02:00:00, ... 2015-05-01 00:00:00, 2015-05-01 01:00:00 , ... , 2015-07-07 22:00:00, and 2015-07-07 23:00:00. You will have a column containing the actual timeseries.

One can only do that efficiently in PostgreSQL by using the DATE_TRUNC function, and I looked in the source code, but it does not seem to be used in anywhere that might resemble something like my excerpt. I think it is possible to write a custom Transform for that, but it would be nice to have it ready out of the box. At least in my case, projects always involve aggregating and presenting data with a timeseries.

comment:4 by Josh Smeaton, 9 years ago

Malefice, if you feel this would be something useful for Django, then please feel welcome to open up a PR with an implementation that uses Expressions, and we can consider the inclusion at that point. It will have to work with the other supported backends too though in some way.

comment:5 by Anssi Kääriäinen, 9 years ago

I think DateTrunk expression makes perfect sense for Django. It is often very useful for aggregation for example. Similarly date formatting and date parsing functions would be useful.

Oracle and PostgreSQL have date truncation methods out of the box, for sqlite and mysql it seems we have to use some tricks (format first the date as string '%y-%m-01', and then read that back in as date for month truncate for example).

Note: See TracTickets for help on using tickets.
Back to Top