Context Navigation

← Previous Ticket
Next Ticket →

#22278 closed Uncategorized (invalid)

Excessive GROUP BY when using the Count annotation

Reported by:	Mark Jones	Owned by:	nobody
Component:	Uncategorized	Version:	1.6
Severity:	Normal	Keywords:
Cc:	Shai Berger	Triage Stage:	Unreviewed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

The following code illustrates a group by that seems computationally expensive since it is grouping by a large number of fields when it could be grouping by ID.

The generated SQL looks like this:

SELECT "foo_employee"."id", "foo_employee"."last_name", "foo_employee"."first_name", "foo_employee"."another_attribute", COUNT("foo_shift"."id") AS "overlapping_events" 
FROM "foo_employee" LEFT OUTER JOIN "foo_shift" ON ( "foo_employee"."id" = "foo_shift"."employee_id" ) 
WHERE ("foo_shift"."end_at" <= 2014-03-16 08:28:43.040428-07:00  AND "foo_shift"."start_at" >= 2014-03-16 08:28:43.040407-07:00 ) 
GROUP BY "foo_employee"."id", "foo_employee"."last_name", "foo_employee"."first_name", "foo_employee"."another_attribute" 
ORDER BY "overlapping_events" ASC, "foo_employee"."last_name" ASC, "foo_employee"."first_name" ASC

Everything after foo_employee.id in the GROUP BY isn't needed. Now the sql statement optimizer might be smart enough to figure this out, but then again, it might not.... Seems like if the pk is in the GROUP BY list, the rest of it isn't needed.

import datetime
from django.db import models

class Employee(models.Model):
    last_name = models.CharField(max_length=64)
    first_name = models.CharField(max_length=64)
    another_attribute = models.CharField(max_length=64)

class Shift(models.Model):
    start_at = models.DateTimeField()
    end_at = models.DateTimeField()
    employee = models.ForeignKey(Employee, related_name="shift_set")


start_at = datetime.datetime.now()
end_at = datetime.datetime.now()
qs = Employee.objects.filter(shift_set__start_at__gte=start_at,
                             shift_set__end_at__lte=end_at
                             ).annotate(overlapping_events=models.Count('shift_set'))
print qs.order_by('overlapping_events', 'last_name', 'first_name').query

Changing the Count('shift_set') to Count('id') has no effect on the group by.

Change History (1)

comment:1 by Shai Berger, 11 years ago

Cc:	Shai Berger added
Resolution:	→ invalid
Status:	new → closed

Oh, it is needed on any backend that isn't MySql, and that is for two reasons:

1) The SQL standard is stupid; once you have aggregates, it doesn't allow any column to be selected unless it is in the group-by clause.

2) MySQL is stupid, and ignores the above restriction in the SQL standard.

The stupidity of MySQL is because it doesn't actually check that the group-by columns determine all the other selected columns, and so it allows queries with non-deterministic results.

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#22278 closed Uncategorized (invalid)

Excessive GROUP BY when using the Count annotation

Description

Change History (1)

comment:1 by Shai Berger, 11 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us