#11088 closed (invalid)
Aggregation problem, table JOINed twice
| Reported by: | Owned by: | ||
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.1-beta |
| Severity: | Keywords: | aggregation, join | |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
I don't know if I'm doing something wrong but I've tried to compile all the information needed to track this one.
Models
video/models.py
class Video(models.Model):
name = models.TextField()
dir = models.TextField()
videoclass = models.ForeignKey('Videoclass')
status = models.CharField(max_length=24)
date = models.DateTimeField()
duration = models.FloatField()
width = models.IntegerField()
height = models.IntegerField()
displayname = models.TextField()
published = models.IntegerField()
views = models.IntegerField()
ratesum = models.IntegerField()
ratenum = models.IntegerField()
statistics/models.py
class VisitorAction(models.Model):
name = models.CharField(max_length = 200)
def __unicode__(self):
return self.name
class VisitorLog(models.Model):
visitor = models.ForeignKey("Visitor")
video = models.ForeignKey(Video)
action = models.ForeignKey(VisitorAction)
seek_video = models.FloatField()
event_time = models.DateTimeField(default = datetime.datetime.now)
domain = models.CharField(max_length = 128)
Datas
statistics_visitorlog contains 100k+ entries
video_video contains 400 entries
statistics_visitoraction contains 6 entries ('play', 'seek' ... etc)
The QS
>>> from video.models import *
>>> from django.db.models import Count
>>> Video.objects.annotate(play_log = Count('visitorlog')).filter(visitorlog__action__name = 'play').order_by('-play_log')[0:10]
[Kill the mysqld because it takes forever and have a nice backtrace]
>>> from django.db import connection
>>> connection.queries[-1]
{'time': '7.180', 'sql': u'SELECT `video_video`.`id`, `video_video`.`name`, `video_video`.`dir`, `video_video`.`videoclass_id`, `video_video`.`status`, `video_video`.`date`, `video_video`.`duration`, `video_video`.`width`, `video_video`.`height`, `video_video`.`displayname`, `video_video`.`published`, `video_video`.`views`, `video_video`.`ratesum`, `video_video`.`ratenum`, COUNT(`statistics_visitorlog`.`id`) AS `play_log` FROM `video_video` LEFT OUTER JOIN `statistics_visitorlog` ON (`video_video`.`id` = `statistics_visitorlog`.`video_id`) INNER JOIN `statistics_visitorlog` T4 ON (`video_video`.`id` = T4.`video_id`) INNER JOIN `statistics_visitoraction` ON (T4.`action_id` = `statistics_visitoraction`.`id`) WHERE `statistics_visitoraction`.`name` = play GROUP BY `video_video`.`id` ORDER BY play_log DESC LIMIT 10'}
The sql request
SELECT `video_video`.`id` ,
`video_video`.`name` ,
`video_video`.`dir` ,
`video_video`.`videoclass_id`,
`video_video`.`status` ,
`video_video`.`date` ,
`video_video`.`duration` ,
`video_video`.`width` ,
`video_video`.`height` ,
`video_video`.`displayname` ,
`video_video`.`published` ,
`video_video`.`views` ,
`video_video`.`ratesum` ,
`video_video`.`ratenum` ,
COUNT(`statistics_visitorlog`.`id`) AS `play_log`
FROM `video_video`
LEFT OUTER JOIN `statistics_visitorlog`
ON
(
`video_video`.`id` = `statistics_visitorlog`.`video_id`
)
INNER JOIN `statistics_visitorlog` T4
ON
(
`video_video`.`id` = T4.`video_id`
)
INNER JOIN `statistics_visitoraction`
ON
(
T4.`action_id` = `statistics_visitoraction`.`id`
)
WHERE `statistics_visitoraction`.`name` = 'play'
GROUP BY `video_video`.`id`
It takes forever and burns my cpu. However, if I remove the LEFT OUTER JOIN on statistics_visitorlog (which seems to be joined twice), and fix the select to call the proper JOINed table:
SELECT `video_video`.`id` ,
`video_video`.`name` ,
`video_video`.`dir` ,
`video_video`.`videoclass_id`,
`video_video`.`status` ,
`video_video`.`date` ,
`video_video`.`duration` ,
`video_video`.`width` ,
`video_video`.`height` ,
`video_video`.`displayname` ,
`video_video`.`published` ,
`video_video`.`views` ,
`video_video`.`ratesum` ,
`video_video`.`ratenum` ,
COUNT(T4.`id`) AS `play_log`
FROM `video_video`
INNER JOIN `statistics_visitorlog` T4
ON
(
`video_video`.`id` = T4.`video_id`
)
INNER JOIN `statistics_visitoraction`
ON
(
T4.`action_id` = `statistics_visitoraction`.`id`
)
WHERE `statistics_visitoraction`.`name` = 'play'
GROUP BY `video_video`.`id`
this returns in 0.6 sec with all the datas needed to fill the objects.
Change History (4)
comment:1 by , 17 years ago
comment:2 by , 17 years ago
| Resolution: | → invalid |
|---|---|
| Status: | new → closed |
Mir's analysis appears to be correct; this issue is discussed in the documentation.
comment:4 by , 13 years ago
| Component: | ORM aggregation → Database layer (models, ORM) |
|---|
Your query means to first annotate, counting the visitorlog entries for all actions, and then afterwards apply the filter to the annotated query. I think you have to swap the annotate and filter clauses in your query to get what you intend:
Video.objects.filter(visitorlog__action__name = 'play').annotate(play_log = Count('visitorlog')).order_by('-play_log')[0:10]I'm not particularily experienced with annotations, so I don't want to close your ticket without your confirmation that you see it the same way.