#27577 closed Bug (invalid)
Django annotated query returns wrong results
Reported by: | Mahmood Khan | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.10 |
Severity: | Normal | Keywords: | annotated query |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I have three related models listed below and I would like to run an aggregate query to give me the total number of tasks completed by person but only those tasks that are in a specific folder.
class Person(models.Model): firstName = models.CharField(max_length=100, null=True, blank=True) lastName = models.CharField(max_length=100, null=True, blank=True) class Folder(models.Model): title = models.CharField(max_length=254, null=True, blank=True) assignees = models.ManyToManyField(Person, related_name="projects") completedDate = models.DateTimeField(blank=True, null=True) class Task(models.Model): title = models.CharField(max_length=254, null=True, blank=True) assignees = models.ManyToManyField(Person, related_name="tasks") folders = models.ManyToManyField(Folder, related_name="tasks")
And here is my attempted query:
tasks = Contact.objects.filter(tasks__folders__id='I58343DS89ASDF').distinct().filter(tasks__completedDate__gte='2016-10-01 00:00:00').annotate(total=Count('tasks'), name=F('firstName')).values('total', 'name')
This is giving me an incorrect count. So I inspected the SQL statement that is generated by this Django query. The SQL statement is:
SELECT DISTINCT COUNT(`wrike_task_assignees`.`task_id`) AS `total`, `wrike_contact`.`firstName` AS `name` FROM `wrike_contact` INNER JOIN `wrike_task_assignees` ON (`wrike_contact`.`id` = `wrike_task_assignees`.`contact_id`) INNER JOIN `wrike_task` ON (`wrike_task_assignees`.`task_id` = `wrike_task`.`id`) INNER JOIN `wrike_task_folders` ON (`wrike_task`.`id` = `wrike_task_folders`.`task_id`) INNER JOIN `wrike_task_assignees` T6 ON (`wrike_contact`.`id` = T6.`contact_id`) INNER JOIN `wrike_task` T7 ON (T6.`task_id` = T7.`id`) WHERE (`wrike_task_folders`.`folder_id` = I58343DS89ASDF AND T7.`completedDate` >= 2016-10-01 00:00:00) GROUP BY `wrike_contact`.`id` ORDER BY NULL
And a nicely formatted version is:
SELECT DISTINCT COUNT(ta.task_id) AS `total`, c.firstName AS `name` FROM wrike_contact c INNER JOIN wrike_task_assignees ta ON (c.id = ta.contact_id) INNER JOIN wrike_task t ON (ta.task_id = t.id) INNER JOIN wrike_task_folders tf ON (t.id = tf.task_id) INNER JOIN wrike_task_assignees T6 ON (c.id = T6.contact_id) INNER JOIN wrike_task T7 ON (ta.task_id = T7.id) WHERE (tf.folder_id = 'I58343DS89ASDF' AND t.completedDate >= '2016-10-01 00:00:00') GROUP BY c.id ORDER BY NULL
Notice that a couple INNER JOIN are repeated twice -- not sure why.
If I remove those duplicate INNER JOIN and then run the query, the results are correct. Here is the version of SQL statement in which I removed the duplicate INNER JOIN segments:
SELECT DISTINCT c.firstName AS name, COUNT(ta.task_id) AS `total` FROM wrike_contact c INNER JOIN wrike_task_assignees ta ON (c.id = ta.contact_id) INNER JOIN wrike_task t ON (ta.task_id = t.id) INNER JOIN wrike_task_folders tf ON (t.id = tf.task_id) WHERE (tf.folder_id = 'I58343DS89ASDF' AND t.completedDate >= '2016-10-01 00:00:00') GROUP BY c.id ORDER BY name;
I have searched the forum and couldn't find that this issue is already reported. So I reporting it here.
Hi Mahmood,
Thanks for filing this report with all the details. The behavior you're seeing may not be completely intuitive, but it is done this way on purpose and it is documented: https://docs.djangoproject.com/en/1.10/topics/db/queries/#spanning-multi-valued-relationships
Hope this helps,
Shai.