﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
23383	Django ORM generates inefficient SQL (only selected columns should be grouped by)	Alexander	nobody	"I have encountered a problem. When I run the following code in django shell:

{{{
from django.contrib.auth.models import Group
from django.db.models import Count
print Group.objects.annotate(cnt=Count('user')).values('id', 'cnt').query.sql_with_params()
}}}

Django ORM generates the following SQL query:


{{{
SELECT `auth_group`.`id`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id`, `auth_group`.`name` ORDER BY NULL
}}}

'''auth_group.name''' occurs in '''GROUP BY''' statement. But this column isn't represented in '''SELECT''' statement. Such query is inefficient (expecially for large tables with many columns and rows).

Debuging of Django SQLCompiler (https://github.com/django/django/blob/stable/1.6.x/django/db/models/sql/compiler.py#L568) gives me the following information:
- postgresql:
    ''self.query.select'' == ''self.query.group_by'' == [(u'auth_group', u'id'), (u'auth_group', 'name')]
    ''self.connection.features.allows_group_by_pk'' is False
    ''(len(self.query.get_meta().concrete_fields) == len(self.query.select))'' is False
    '''auth_group.name''' appears in result because of the line:
{{{
cols = self.query.group_by + having_group_by + select_cols
}}}

- mysql:
    ''self.query.select == self.query.group_by'' == [(u'auth_group', u'id'), (u'auth_group', 'name')]
    ''self.connection.features.allows_group_by_pk'' is True
    ''(len(self.query.get_meta().concrete_fields) == len(self.query.select))'' is False
    '''auth_group.name''' appears in result because of the line:
{{{
cols = self.query.group_by + having_group_by + select_cols
}}}


In the same time, the following code (without ''.values()''):


{{{
from django.contrib.auth.models import Group
from django.db.models import Count
print Group.objects.annotate(cnt=Count('user')).query.sql_with_params()
}}}

gives the right SQL query for mysql (because ''(len(self.query.get_meta().concrete_fields) == len(self.query.select))'' is True):


{{{
SELECT `auth_group`.`id`, `auth_group`.`name`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id` ORDER BY NULL
}}}

I guess, only selected columns (either through ''values()'' or ''only()'') should be grouped by, regardless of the backend"	Uncategorized	closed	Database layer (models, ORM)	1.6	Normal	invalid			Unreviewed	0	0	0	0	0	0
