﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
19259	Annotations generating inefficient SQL on PostgreSQL	Henrique C. Alves	Simon Charette	"Considering the following models:

{{{
    class Movie(EditableBase, TimestampBase):
        year = models.PositiveSmallIntegerField(u""ano"", null=True)
        country = models.CharField(u""país"", max_length=100, blank=True, null=True)
        release_date = models.DateField(u""data de estréia"", blank=True, null=True)
        length = models.PositiveSmallIntegerField(u""duração"", blank=True, null=True)
        rating = models.PositiveIntegerField(u""avaliação"", blank=True, null=True, choices=RATING_CHOICES)
        suitable_for = models.PositiveSmallIntegerField(u""censura"", blank=True, null=True, choices=AGE_CHOICES)
        website_original = models.URLField(u""website"", max_length=255, blank=True, null=True)
        website_national = models.URLField(u""website (nacional)"", max_length=255, blank=True, null=True)
        synopsis = models.TextField(u""sinopse"", blank=True)
        cover = models.ImageField(u""poster"", upload_to=u'uploads/fichatecnica/', blank=True, null=True)
        genres = models.ManyToManyField(MovieGenre, verbose_name=u""gêneros"")

    class MovieTheaterSession(models.Model):
        start_date = models.DateField(u""início"")
        end_date = models.DateField(u""fim"")
        movie = models.ForeignKey(Movie, verbose_name=u""filme"", related_name='sessions')
}}}

If we annotate the query to count the number of sessions:

{{{
    >>> Movie.objects.all().annotate(sessions_count=Count('sessions')).order_by('release_date')[:60]
}}}

Django generates pretty inneficient SQL by GROUPing BY on all fields from the parent:

{{{
    SELECT ""movies_movie"".""id"", ""movies_movie"".""creation_date"", ""movies_movie"".""modification_date"", ""movies_movie"".""year"", ""movies_movie"".""country"", ""movies_movie"".""release_date"", ""movies_movie"".""length"", ""movies_movie"".""rating"", ""movies_movie"".""suitable_for"", ""movies_movie"".""website_original"", ""movies_movie"".""website_national"", ""movies_movie"".""synopsis"", ""movies_movie"".""cover"", COUNT(""theaters_movietheatersession"".""id"") AS ""sessions_count"" FROM ""movies_movie"" LEFT OUTER JOIN ""theaters_movietheatersession"" ON (""movies_movie"".""id"" = ""theaters_movietheatersession"".""movie_id"") GROUP BY ""movies_movie"".""id"", ""movies_movie"".""creation_date"", ""movies_movie"".""modification_date"", ""movies_movie"".""year"", ""movies_movie"".""country"", ""movies_movie"".""release_date"", ""movies_movie"".""length"", ""movies_movie"".""rating"", ""movies_movie"".""suitable_for"", ""movies_movie"".""website_original"", ""movies_movie"".""website_national"", ""movies_movie"".""synopsis"", ""movies_movie"".""cover"" ORDER BY ""movies_movie"".""release_date"" ASC LIMIT 60
    Time: 892,122 ms
}}}

EXPLAIN shows the database is sorting all fields from the GROUP BY clause:

{{{
    Limit  (cost=2250.37..2250.52 rows=60 width=502)
      ->  Sort  (cost=2250.37..2256.15 rows=2311 width=502)
            Sort Key: (count(theaters_movietheatersession.id)), movies_movie.release_date
            ->  GroupAggregate  (cost=1975.47..2170.56 rows=2311 width=502)
                  ->  Sort  (cost=1975.47..1986.94 rows=4586 width=502)
                        Sort Key: movies_movie.id, movies_movie.creation_date, movies_movie.modification_date, movies_movie.year, movies_movie.country, movies_movie.release_date, movies_movie.length, movies_movie.rating, movies_movie.suitable_for, movies_movie.website_original, movies_movie.website_national, movies_movie.synopsis, movies_movie.cover
                        ->  Merge Left Join  (cost=0.00..660.58 rows=4586 width=502)
                              Merge Cond: (movies_movie.id = theaters_movietheatersession.movie_id)
                              ->  Index Scan using movies_movie_pkey on movies_movie  (cost=0.00..283.14 rows=2311 width=498)
                              ->  Index Scan using theaters_movietheatersession_movie_id on theaters_movietheatersession  (cost=0.00..314.34 rows=4586 width=8)
}}}

It suffices to GROUP BY by the PK field:

{{{
    SELECT ""movies_movie"".""id"", ""movies_movie"".""creation_date"", ""movies_movie"".""modification_date"", ""movies_movie"".""year"", ""movies_movie"".""country"", ""movies_movie"".""release_date"", ""movies_movie"".""length"", ""movies_movie"".""rating"", ""movies_movie"".""suitable_for"", ""movies_movie"".""website_original"", ""movies_movie"".""website_national"", ""movies_movie"".""synopsis"", ""movies_movie"".""cover"", COUNT(""theaters_movietheatersession"".""id"") AS ""sessions_count"" FROM ""movies_movie"" LEFT OUTER JOIN ""theaters_movietheatersession"" ON (""movies_movie"".""id"" = ""theaters_movietheatersession"".""movie_id"") GROUP BY ""movies_movie"".""id"" ORDER BY ""movies_movie"".""release_date"" ASC LIMIT 60
    Time: 16,285 ms
}}}

EXPLAIN shows the database doesn't need to sort anymore:

{{{
    Limit  (cost=786.42..786.57 rows=60 width=502)
       ->  Sort  (cost=786.42..792.20 rows=2311 width=502)
             Sort Key: movies_movie.release_date
             ->  GroupAggregate  (cost=0.00..706.62 rows=2311 width=502)
                   ->  Merge Left Join  (cost=0.00..660.58 rows=4586 width=502)
                         Merge Cond: (movies_movie.id = theaters_movietheatersession.movie_id)
                         ->  Index Scan using movies_movie_pkey on movies_movie  (cost=0.00..283.14 rows=2311 width=498)
                         ->  Index Scan using theaters_movietheatersession_movie_id on theaters_movietheatersession  (cost=0.00..314.34 rows=4586 width=8)
}}}

I've tried fixing it by setting the undocumented `group_by` attribute, but Django doesn't seem to pick it up:

{{{
class MovieManager(models.Manager):
        def with_sessions(self):
            qs = self.get_query_set().annotate(
                sessions_count=Count('sessions'))
            qs.group_by = ['id']
            return qs
}}}

I'm using Django 1.4 and PostgreSQL 9.1."	Cleanup/optimization	closed	Database layer (models, ORM)	dev	Normal	fixed		hcarvalhoalves@… dev@… Simon Charette	Ready for checkin	1	0	0	0	0	0
