#23383 closed Uncategorized (invalid)
Django ORM generates inefficient SQL (only selected columns should be grouped by)
Reported by: | Alexander | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.6 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have encountered a problem. When I run the following code in django shell:
from django.contrib.auth.models import Group from django.db.models import Count print Group.objects.annotate(cnt=Count('user')).values('id', 'cnt').query.sql_with_params()
Django ORM generates the following SQL query:
SELECT `auth_group`.`id`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id`, `auth_group`.`name` ORDER BY NULL
auth_group.name occurs in GROUP BY statement. But this column isn't represented in SELECT statement. Such query is inefficient (expecially for large tables with many columns and rows).
Debuging of Django SQLCompiler (https://github.com/django/django/blob/stable/1.6.x/django/db/models/sql/compiler.py#L568) gives me the following information:
- postgresql:
self.query.select == self.query.group_by == [(u'auth_group', u'id'), (u'auth_group', 'name')]
self.connection.features.allows_group_by_pk is False
(len(self.query.get_meta().concrete_fields) == len(self.query.select)) is False
auth_group.name appears in result because of the line:
cols = self.query.group_by + having_group_by + select_cols
- mysql:
self.query.select == self.query.group_by == [(u'auth_group', u'id'), (u'auth_group', 'name')]
self.connection.features.allows_group_by_pk is True
(len(self.query.get_meta().concrete_fields) == len(self.query.select)) is False
auth_group.name appears in result because of the line:
cols = self.query.group_by + having_group_by + select_cols
In the same time, the following code (without .values()):
from django.contrib.auth.models import Group from django.db.models import Count print Group.objects.annotate(cnt=Count('user')).query.sql_with_params()
gives the right SQL query for mysql (because (len(self.query.get_meta().concrete_fields) == len(self.query.select)) is True):
SELECT `auth_group`.`id`, `auth_group`.`name`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id` ORDER BY NULL
I guess, only selected columns (either through values() or only()) should be grouped by, regardless of the backend
Change History (2)
comment:1 by , 10 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 10 years ago
Ideas welcome how to make values + annotate interactions more intuitive.
Do you think it would be possible to make values()
also change the GROUP BY
clause?
The reason for this is that values + annotate has a bit complex definition. If I recall correctly it goes like this:
In addition .extra() has some special casing.
So, for your case, you'll need to first call .values(), then .annotate() and the result should be what you expect. Closing as invalid - the query behaves as defined. Ideas welcome how to make values + annotate interactions more intuitive.
We have two improvements we could do here:
The latter has a ticket, but MySQL GROUP BY for the case mentioned in this ticket doesn't have a separate ticket (at least I don't recall one). A separate ticket for the MySQL case is welcome.