Opened 10 years ago

Closed 9 years ago

Last modified 9 years ago

#25136 closed Cleanup/optimization (fixed)

Add an example with Count('X', distinct=True) to the queries topic guide

Reported by: Michał Domański Owned by: Caio Ariede
Component: Documentation Version: 1.8
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

The problem is the difference between the results of count() on entries relation and num_entries if I also want an annotation for alerts relation

reports = Report.objects.filter(
        user=request.user).annotate(num_entries=Count('entries'), num_alerts=Count('alerts'))
print report.num_entries, report.entries.count()

Yet, this problem did not happen if there was only 1 argument to annotate.

What's happening? Well, both entries and alerts are FK relations, thus they result in this two left outer joins. The SQL is something like this:

SELECT `coverage_reports`.*
COUNT(`alerts_alert`.`id`) AS `num_alerts`,
COUNT(`coverage_reports_entry`.`id`) AS `num_entries`
FROM `coverage_reports_coveragereport`
LEFT OUTER JOIN `alerts_alert` ON ( `coverage_reports_coveragereport`.`id` = `alerts_alert`.`coverage_report_id` )
LEFT OUTER JOIN `coverage_reports_entry` ON (`coverage_reports_coveragereport`.`id` = `coverage_reports_entry`.`coverage_report_id` )
WHERE `coverage_reports_coveragereport`.`user_id` = 1 GROUP BY `coverage_reports_coveragereport`.`id` ;

This is problematic as joins will duplicate parent records if more than one child record is associated to it. This is what can inflate values from aggregate functions.

The fix to this particular query was such:

SELECT reports.id, alerts.num_alerts, entries.num_entries
FROM coverage_reports_coveragereport AS reports
LEFT JOIN
  (SELECT coverage_report_id, COUNT(*) AS num_alerts FROM alerts_alert GROUP BY coverage_report_id) AS alerts
ON reports.id = alerts.coverage_report_id
LEFT JOIN
  (SELECT coverage_report_id, COUNT(*) AS num_entries FROM coverage_reports_entry GROUP BY coverage_report_id) AS entries
ON reports.id = entries.coverage_report_id
WHERE reports.user_id = 1 GROUP BY reports.id;

Change History (10)

comment:1 by Tim Graham, 10 years ago

Does adding distinct=True to the the Counts resolve the issue? If so, we could probably add an example to topics/db/aggregation about this.

comment:2 by Michał Domański, 10 years ago

Yes, it does though it's rather unobvious.

comment:3 by Tim Graham, 10 years ago

Component: Database layer (models, ORM)Documentation
Easy pickings: set
Summary: Multiple annotations result in wrong results with MySQL backendAdd an example with Count('X', distinct=True) to the queries topic guide
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization

comment:4 by Caio Ariede, 9 years ago

Owner: changed from nobody to Caio Ariede
Status: newassigned

comment:6 by Tim Graham <timograham@…>, 9 years ago

Resolution: fixed
Status: assignedclosed

In 3862c568:

Fixed #25136 -- Documented Count('X', distinct=True) in aggregate topic guide.

comment:7 by Tim Graham <timograham@…>, 9 years ago

In 9f10c5c:

[1.8.x] Fixed #25136 -- Documented Count('X', distinct=True) in aggregate topic guide.

Backport of 3862c568ac1b920188ecfbe5cd8073160206d6b9 from master

comment:8 by Tim Graham <timograham@…>, 9 years ago

In fe70f280:

Refs #25136 -- Fixed nonexistent field reference in aggregation topic guide.

Thanks Ankush Thakur for the report and Simon for the review.

comment:9 by Tim Graham <timograham@…>, 9 years ago

In 7934695a:

[1.8.x] Refs #25136 -- Fixed nonexistent field reference in aggregation topic guide.

Thanks Ankush Thakur for the report and Simon for the review.

Backport of fe70f280d7ea4402f676696c4013c4a23d4e4990 from master

comment:10 by Tim Graham <timograham@…>, 9 years ago

In cc5c4ae:

[1.9.x] Refs #25136 -- Fixed nonexistent field reference in aggregation topic guide.

Thanks Ankush Thakur for the report and Simon for the review.

Backport of fe70f280d7ea4402f676696c4013c4a23d4e4990 from master

Note: See TracTickets for help on using tickets.
Back to Top