Code

Opened 5 years ago

Closed 5 years ago

Last modified 14 months ago

#11088 closed (invalid)

Aggregation problem, table JOINed twice

Reported by: loic@… Owned by:
Component: Database layer (models, ORM) Version: 1.1-beta
Severity: Keywords: aggregation, join
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

I don't know if I'm doing something wrong but I've tried to compile all the information needed to track this one.

Models

video/models.py

class Video(models.Model):
    name = models.TextField()
    dir = models.TextField()
    videoclass = models.ForeignKey('Videoclass')
    status = models.CharField(max_length=24)
    date = models.DateTimeField()
    duration = models.FloatField()
    width = models.IntegerField()
    height = models.IntegerField()
    displayname = models.TextField()
    published = models.IntegerField()
    views = models.IntegerField()
    ratesum = models.IntegerField()
    ratenum = models.IntegerField() 

statistics/models.py

class VisitorAction(models.Model):
    name = models.CharField(max_length = 200)

    def __unicode__(self):
        return self.name


class VisitorLog(models.Model):
    visitor = models.ForeignKey("Visitor")
    video = models.ForeignKey(Video)
    action = models.ForeignKey(VisitorAction)
    seek_video = models.FloatField()
    event_time = models.DateTimeField(default = datetime.datetime.now)
    domain = models.CharField(max_length = 128)

Datas

statistics_visitorlog contains 100k+ entries

video_video contains 400 entries

statistics_visitoraction contains 6 entries ('play', 'seek' ... etc)

The QS

>>> from video.models import *
>>> from django.db.models import Count
>>> Video.objects.annotate(play_log = Count('visitorlog')).filter(visitorlog__action__name = 'play').order_by('-play_log')[0:10]

[Kill the mysqld because it takes forever and have a nice backtrace]

>>> from django.db import connection
>>> connection.queries[-1]
{'time': '7.180', 'sql': u'SELECT `video_video`.`id`, `video_video`.`name`, `video_video`.`dir`, `video_video`.`videoclass_id`, `video_video`.`status`, `video_video`.`date`, `video_video`.`duration`, `video_video`.`width`, `video_video`.`height`, `video_video`.`displayname`, `video_video`.`published`, `video_video`.`views`, `video_video`.`ratesum`, `video_video`.`ratenum`, COUNT(`statistics_visitorlog`.`id`) AS `play_log` FROM `video_video` LEFT OUTER JOIN `statistics_visitorlog` ON (`video_video`.`id` = `statistics_visitorlog`.`video_id`) INNER JOIN `statistics_visitorlog` T4 ON (`video_video`.`id` = T4.`video_id`) INNER JOIN `statistics_visitoraction` ON (T4.`action_id` = `statistics_visitoraction`.`id`) WHERE `statistics_visitoraction`.`name` = play  GROUP BY `video_video`.`id` ORDER BY play_log DESC LIMIT 10'}

The sql request

SELECT   `video_video`.`id`           ,
            `video_video`.`name`         ,
            `video_video`.`dir`          ,
            `video_video`.`videoclass_id`,
            `video_video`.`status`       ,
            `video_video`.`date`         ,
            `video_video`.`duration`     ,
            `video_video`.`width`        ,
            `video_video`.`height`       ,
            `video_video`.`displayname`  ,
            `video_video`.`published`    ,
            `video_video`.`views`        ,
            `video_video`.`ratesum`      ,
            `video_video`.`ratenum`      ,
            COUNT(`statistics_visitorlog`.`id`) AS `play_log`
   FROM     `video_video`
            LEFT OUTER JOIN `statistics_visitorlog`
            ON
                     (
                              `video_video`.`id` = `statistics_visitorlog`.`video_id`
                     )
            INNER JOIN `statistics_visitorlog` T4
            ON
                     (
                              `video_video`.`id` = T4.`video_id`
                     )
            INNER JOIN `statistics_visitoraction`
            ON
                     (
                              T4.`action_id` = `statistics_visitoraction`.`id`
                     )
   WHERE    `statistics_visitoraction`.`name` = 'play'
   GROUP BY `video_video`.`id`

It takes forever and burns my cpu. However, if I remove the LEFT OUTER JOIN on statistics_visitorlog (which seems to be joined twice), and fix the select to call the proper JOINed table:

SELECT   `video_video`.`id`           ,
            `video_video`.`name`         ,
            `video_video`.`dir`          ,
            `video_video`.`videoclass_id`,
            `video_video`.`status`       ,
            `video_video`.`date`         ,
            `video_video`.`duration`     ,
            `video_video`.`width`        ,
            `video_video`.`height`       ,
            `video_video`.`displayname`  ,
            `video_video`.`published`    ,
            `video_video`.`views`        ,
            `video_video`.`ratesum`      ,
            `video_video`.`ratenum`      ,
            COUNT(T4.`id`) AS `play_log`
   FROM     `video_video`
            INNER JOIN `statistics_visitorlog` T4
            ON
                     (
                              `video_video`.`id` = T4.`video_id`
                     )
            INNER JOIN `statistics_visitoraction`
            ON
                     (
                              T4.`action_id` = `statistics_visitoraction`.`id`
                     )
   WHERE    `statistics_visitoraction`.`name` = 'play'
   GROUP BY `video_video`.`id`

this returns in 0.6 sec with all the datas needed to fill the objects.

Attachments (0)

Change History (4)

comment:1 Changed 5 years ago by mir

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Your query means to first annotate, counting the visitorlog entries for all actions, and then afterwards apply the filter to the annotated query. I think you have to swap the annotate and filter clauses in your query to get what you intend:

 Video.objects.filter(visitorlog__action__name = 'play').annotate(play_log = Count('visitorlog')).order_by('-play_log')[0:10]

I'm not particularily experienced with annotations, so I don't want to close your ticket without your confirmation that you see it the same way.

comment:2 Changed 5 years ago by russellm

  • Resolution set to invalid
  • Status changed from new to closed

Mir's analysis appears to be correct; this issue is discussed in the documentation.

comment:3 Changed 3 years ago by jacob

  • milestone 1.1 deleted

Milestone 1.1 deleted

comment:4 Changed 14 months ago by akaariai

  • Component changed from ORM aggregation to Database layer (models, ORM)

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.