#31005 closed Uncategorized (duplicate)
Incorrect GROUP BY aggregate values when filtering by annotation on reverse-related LEFT OUTER JOIN
Reported by: | Oroku Saki | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.11 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Given the following models:
from django.db import models class Account(models.Model): name = models.CharField(max_length=255) class Ticket(models.Model): account = models.ForeignKey(Account) class StatusChange(Model): ticket = models.ForeignKey(Ticket) created_on = models.DateField(auto_now_add=True)
Create some objects:
a = Account.objects.create(name='One') t = a.ticket_set.create() t.statuschange_set.create() t.statuschange_set.create() a = Account.objects.create(name='Two') t = a.ticket_set.create() t.statuschange_set.create() t.statuschange_set.create() t.statuschange_set.create()
To summarize, you've created 2 Accounts, each with 1 Ticket, and each Ticket with 2 and 3 StatusChange records per Ticket, respectively.
Now, query for some reporting info:
from datetime import timedelta from django.utils import timezone from django.db.models import Max, Count # Make a Ticket queryset annotated with the last update to a related StatusChange qs = Ticket.objects.annotate( updated_at=Max( 'statuschange__created_on', output_field=models.DateField() ) ) # Filter by updated_at (just filter by < now to make sure no records are filtered out) qs = qs.filter(updated_at__lte=timezone.now()) # Now, count the rows, grouped by Account name qs.order_by('account__name').values('account__name').annotate(Count('pk'))
The expected output is:
<QuerySet [{'account__name': 'One', 'pk__count': 1}, {'account__name': 'Two', 'pk__count': 1},]>
Since we know that there is just 1 Ticket per Account.
Instead, I'm seeing:
<QuerySet [{'account__name': 'One', 'pk__count': 2}, {'account__name': 'Two', 'pk__count': 3},]>
Because the LEFT OUTER JOIN
in the query is causing the second Count
annotation to count once per outer-related row.
Change History (2)
comment:1 by , 5 years ago
Resolution: | → duplicate |
---|---|
Status: | new → closed |
comment:2 by , 5 years ago
Replying to Simon Charette:
That's a documented behavior of multiple annotations on multi-valued relationships see #10060 and #28296 which attempts to address it by allowing aggregates to be wrapped in subqueries.
ZOMG, I had a huge brain fart today - Count(..., distinct=True)
is in prior commits of mine in my own codebase :/
Thank you very much for your reply; you saved me a ton of time. Also, thanks for linking me to those other 2 tickets.
That's a documented behavior of multiple annotations on multi-valued relationships see #10060 and #28296 which attempts to address it by allowing aggregates to be wrapped in subqueries.