Opened 11 months ago

Closed 11 months ago

Last modified 11 months ago

#23383 closed Uncategorized (invalid)

Django ORM generates inefficient SQL (only selected columns should be grouped by)

Reported by: aalebedev Owned by: nobody
Component: Database layer (models, ORM) Version: 1.6
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I have encountered a problem. When I run the following code in django shell:

from django.contrib.auth.models import Group
from django.db.models import Count
print Group.objects.annotate(cnt=Count('user')).values('id', 'cnt').query.sql_with_params()

Django ORM generates the following SQL query:

SELECT `auth_group`.`id`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id`, `auth_group`.`name` ORDER BY NULL

auth_group.name occurs in GROUP BY statement. But this column isn't represented in SELECT statement. Such query is inefficient (expecially for large tables with many columns and rows).

Debuging of Django SQLCompiler (https://github.com/django/django/blob/stable/1.6.x/django/db/models/sql/compiler.py#L568) gives me the following information:

  • postgresql:

self.query.select == self.query.group_by == [(u'auth_group', u'id'), (u'auth_group', 'name')]
self.connection.features.allows_group_by_pk is False
(len(self.query.get_meta().concrete_fields) == len(self.query.select)) is False
auth_group.name appears in result because of the line:

cols = self.query.group_by + having_group_by + select_cols
  • mysql:

self.query.select == self.query.group_by == [(u'auth_group', u'id'), (u'auth_group', 'name')]
self.connection.features.allows_group_by_pk is True
(len(self.query.get_meta().concrete_fields) == len(self.query.select)) is False
auth_group.name appears in result because of the line:

cols = self.query.group_by + having_group_by + select_cols

In the same time, the following code (without .values()):

from django.contrib.auth.models import Group
from django.db.models import Count
print Group.objects.annotate(cnt=Count('user')).query.sql_with_params()

gives the right SQL query for mysql (because (len(self.query.get_meta().concrete_fields) == len(self.query.select)) is True):

SELECT `auth_group`.`id`, `auth_group`.`name`, COUNT(`auth_user_groups`.`user_id`) AS `cnt` FROM `auth_group` LEFT OUTER JOIN `auth_user_groups` ON ( `auth_group`.`id` = `auth_user_groups`.`group_id` ) GROUP BY `auth_group`.`id` ORDER BY NULL

I guess, only selected columns (either through values() or only()) should be grouped by, regardless of the backend

Change History (2)

comment:1 Changed 11 months ago by akaariai

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

The reason for this is that values + annotate has a bit complex definition. If I recall correctly it goes like this:

  • When you call .annotate() the GROUP BY is set to the current SELECT list
  • When you call .values().annotate(), then .values() first sets the SELECT list, after that the .annotate() sets the GROUP BY to the select list generated by .values()
  • When you call .values() after .annotate() it doesn't change the GROUP BY, but it changes the SELECT clause (this is what is happening to you)

In addition .extra() has some special casing.

So, for your case, you'll need to first call .values(), then .annotate() and the result should be what you expect. Closing as invalid - the query behaves as defined. Ideas welcome how to make values + annotate interactions more intuitive.

We have two improvements we could do here:

1) For MySQL, we should be able to do GROUP BY <primary key> even when not using full fields list from the concrete model
2) On PostgreSQL we should be able to switch to primary key grouping (it works a bit differently to MySQL, but the idea is similar to MySQL's)

The latter has a ticket, but MySQL GROUP BY for the case mentioned in this ticket doesn't have a separate ticket (at least I don't recall one). A separate ticket for the MySQL case is welcome.

comment:2 Changed 11 months ago by charettes

Ideas welcome how to make values + annotate interactions more intuitive.

Do you think it would be possible to make values() also change the GROUP BY clause?

Note: See TracTickets for help on using tickets.
Back to Top