Opened 5 years ago

Closed 5 years ago

#22278 closed Uncategorized (invalid)

Excessive GROUP BY when using the Count annotation

Reported by: Mark Jones Owned by: nobody
Component: Uncategorized Version: 1.6
Severity: Normal Keywords:
Cc: Shai Berger Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no


The following code illustrates a group by that seems computationally expensive since it is grouping by a large number of fields when it could be grouping by ID.

The generated SQL looks like this:

SELECT "foo_employee"."id", "foo_employee"."last_name", "foo_employee"."first_name", "foo_employee"."another_attribute", COUNT("foo_shift"."id") AS "overlapping_events" 
FROM "foo_employee" LEFT OUTER JOIN "foo_shift" ON ( "foo_employee"."id" = "foo_shift"."employee_id" ) 
WHERE ("foo_shift"."end_at" <= 2014-03-16 08:28:43.040428-07:00  AND "foo_shift"."start_at" >= 2014-03-16 08:28:43.040407-07:00 ) 
GROUP BY "foo_employee"."id", "foo_employee"."last_name", "foo_employee"."first_name", "foo_employee"."another_attribute" 
ORDER BY "overlapping_events" ASC, "foo_employee"."last_name" ASC, "foo_employee"."first_name" ASC

Everything after in the GROUP BY isn't needed. Now the sql statement optimizer might be smart enough to figure this out, but then again, it might not.... Seems like if the pk is in the GROUP BY list, the rest of it isn't needed.

import datetime
from django.db import models

class Employee(models.Model):
    last_name = models.CharField(max_length=64)
    first_name = models.CharField(max_length=64)
    another_attribute = models.CharField(max_length=64)

class Shift(models.Model):
    start_at = models.DateTimeField()
    end_at = models.DateTimeField()
    employee = models.ForeignKey(Employee, related_name="shift_set")

start_at =
end_at =
qs = Employee.objects.filter(shift_set__start_at__gte=start_at,
print qs.order_by('overlapping_events', 'last_name', 'first_name').query

Changing the Count('shift_set') to Count('id') has no effect on the group by.

Change History (1)

comment:1 Changed 5 years ago by Shai Berger

Cc: Shai Berger added
Resolution: invalid
Status: newclosed

Oh, it is needed on any backend that isn't MySql, and that is for two reasons:

1) The SQL standard is stupid; once you have aggregates, it doesn't allow any column to be selected unless it is in the group-by clause.

2) MySQL is stupid, and ignores the above restriction in the SQL standard.

The stupidity of MySQL is because it doesn't actually check that the group-by columns determine all the other selected columns, and so it allows queries with non-deterministic results.

Note: See TracTickets for help on using tickets.
Back to Top