Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#24692 closed Cleanup/optimization (wontfix)

QuerySet.extra(select=...) is silently dropped with aggregations

Reported by: jdelic Owned by: nobody
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords:
Cc: mbertheau@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by jdelic)

Here is a minimal example:

from django.db import models

class Service(models.Model):
    name = models.TextField()

class ServiceAction(models.Model):
    performed = models.DateTimeField(auto_now_add=True)
    service = models.ForeignKey(Service)

Then add some models:

>>> from dbtest.models import Service, ServiceAction
>>> from django.db.models import Min, Max
>>> s1 = Service.objects.create(name='test1')
>>> s2 = Service.objects.create(name='test2')
>>> ServiceAction.objects.create(service_id=s1.id)
>>> ServiceAction.objects.create(service_id=s1.id)
>>> ServiceAction.objects.create(service_id=s1.id)
>>> ServiceAction.objects.create(service_id=s2.id)
>>> ServiceAction.objects.create(service_id=s2.id)
>>> ServiceAction.objects.create(service_id=s2.id)

...and query them using aggregations:

>>> qs = ServiceAction.objects.extra(select={'period': '(MAX("dbtest_serviceaction"."performed")-MIN("dbtest_serviceaction"."performed"))'}).values('service_id').annotate(first_action=Min('performed'), last_action=Max('performed'))
>>> print(qs.query)
SELECT "dbtest_serviceaction"."service_id", MAX("dbtest_serviceaction"."performed") AS "last_action", MIN("dbtest_serviceaction"."performed") AS "first_action" FROM "dbtest_serviceaction" GROUP BY "dbtest_serviceaction"."service_id"

As you can see the .extra(select=...) has been dropped from the query. I assume this happens on Django 1.6.x, 1.7.x and 1.8.x, but I have only tested 1.6.11 and 1.8.

The expected outcome would have been:

SELECT "dbtest_serviceaction"."service_id", MAX("dbtest_serviceaction"."performed") AS "last_action", MIN("dbtest_serviceaction"."performed") AS "first_action",  MAX("dbtest_serviceaction"."performed")-MIN("dbtest_serviceaction"."performed") as "period" FROM "dbtest_serviceaction" GROUP BY "dbtest_serviceaction"."service_id"

Using Django 1.8 this exact query can be performed with the ORM itself through the use of the newly added operators for aggregations:

>>> qs = ServiceAction.objects.values('service_id').annotate(first_action=Min('performed'), last_action=Max('performed'), period=Max('performed')-Min('performed'))
>>> print(qs.query)
SELECT "dbtest_serviceaction"."service_id", (MAX("dbtest_serviceaction"."performed") - MIN("dbtest_serviceaction"."performed")) AS "period", MAX("dbtest_serviceaction"."performed") AS "last_action", MIN("dbtest_serviceaction"."performed") AS "first_action" FROM "dbtest_serviceaction" GROUP BY "dbtest_serviceaction"."service_id"

Change History (11)

comment:1 by jdelic, 9 years ago

Description: modified (diff)

comment:2 by Markus Bertheau, 9 years ago

Cc: mbertheau@… added

comment:3 by jdelic, 9 years ago

Description: modified (diff)

comment:4 by Tim Graham, 9 years ago

Not that this fixes the problem, but we are trying to obsolete .extra() by providing better APIs for all its use cases. Do you have a use case in mind where the new expressions API doesn't suffice? If so, I think there will be more interest in working on that instead of fixing bugs with .extra() which seems likely to be deprecated at some point.

in reply to:  4 comment:5 by jdelic, 9 years ago

Replying to timgraham:

Not that this fixes the problem, but we are trying to obsolete .extra() by providing better APIs for all its use cases. Do you have a use case in mind where the new expressions API doesn't suffice? If so, I think there will be more interest in working on that instead of fixing bugs with .extra() which seems likely to be deprecated at some point.

Well, there will always be edge cases where any ORM will not work, I guess. In my case the new expressions API *would suffice*, unfortunately we run a rather big project on Django 1.6.11, migrating to 1.7... so I won't be able to use it for a while.

In any case, I ran into this while trying to generate some much more complex reports from our database. As .extra() hasn't been deprecated yet and I was able to boil it down to such a simple failure case, I think fixing it would still be a very good idea. I feel that if .extra() fails silently on such a simple case, there are bound to be other "more legitimate" use-cases that will fail as well. That said, I haven't investigated Django's code itself, so this might also just be a very narrow failure mode.

comment:6 by Tim Graham, 9 years ago

Component: UncategorizedDatabase layer (models, ORM)

We can accept the ticket and see if someone cares to fix it, but even if fixed it won't be backported to 1.6 or 1.7.

Looking at it a bit more, though, I noticed that period doesn't seem to be used in the values or in the annotate. Do you have a typo there?

comment:7 by jdelic, 9 years ago

Looking at it a bit more, though, I noticed that period doesn't seem to be used in the values or in the annotate. Do you have a typo there?

No, not a typo per se. Maybe a mistake, though. I'm basing this off the is_recent example in https://docs.djangoproject.com/en/1.8/ref/models/querysets/#extra, which also doesn't mention the field anywhere else. It might be worth mentioning that if you add .values('service_id', 'period') then the QuerySet will generate invalid SQL:

>>> qs = ServiceAction.objects.extra(select={'period': '(MAX("dbtest_serviceaction"."performed")-MIN("dbtest_serviceaction"."performed"))'}).values('service_id', 'period').annotate(first_action=Min('performed'), last_action=Max('performed'))
>>> print(qs.query)
SELECT ((MAX("dbtest_serviceaction"."performed")-MIN("dbtest_serviceaction"."performed"))) AS "period", "dbtest_serviceaction"."service_id", MAX("dbtest_serviceaction"."performed") AS "last_action", MIN("dbtest_serviceaction"."performed") AS "first_action" FROM "dbtest_serviceaction" GROUP BY "dbtest_serviceaction"."service_id", ((MAX("dbtest_serviceaction"."performed")-MIN("dbtest_serviceaction"."performed")))
>>> qs
...
OperationalError: aggregate functions are not allowed in the GROUP BY clause

comment:8 by Tim Graham, 9 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Right, but values() limits the select to only those fields so to me it's not surprising that this doesn't work. We can accept the ticket in case someone cares to write a patch (doc or code), but as I mentioned before, I don't think it's useful to spend more time on extra() at this point.

comment:9 by Shai Berger, 9 years ago

So, I see two separate issues here.

The first is that period was not selected in the original query. That, I think, is exactly expected behavior -- that's what values() does.

The second is that, as demonstrated by adding "period" to the values() call, extra() does not support aggregates well (the error is caused by the ORM assuming that period is a non-aggregate field, and so must be pulled into the group-by clause for the aggregate to work). This, AFAICT, cannot be solved without adding to extra() an argument (or a sub-argument) to specify which of the select keys are aggregates; and that, combined with Tim's comments above, makes me think this should be wontfix'd.

comment:10 by Tim Graham, 9 years ago

Resolution: wontfix
Status: newclosed

comment:11 by Josh Smeaton, 9 years ago

I agree with your analysis shaib. I've just tried a few things to see if I can work around the problem, but no I can't. Adding the extra clause at the end of the .values().annotate() also drops the extra, which is somewhat concerning, but it wouldn't work anyway.

Well, there will always be edge cases where any ORM will not work, I guess.

Yes, and that is why users can now build their own expressions with full support of the ORM.

At a maximum we could maybe document that extra does not play well with aggregation, but I can't see us implementing a fix here without turning "extra" into the new expressions api anyway.

Note: See TracTickets for help on using tickets.
Back to Top