Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#31005 closed Uncategorized (duplicate)

Incorrect GROUP BY aggregate values when filtering by annotation on reverse-related LEFT OUTER JOIN

Reported by: Oroku Saki Owned by: nobody
Component: Database layer (models, ORM) Version: 1.11
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Given the following models:

from django.db import models


class Account(models.Model):
    name = models.CharField(max_length=255)


class Ticket(models.Model):
    account = models.ForeignKey(Account)


class StatusChange(Model):
    ticket = models.ForeignKey(Ticket)
    created_on = models.DateField(auto_now_add=True)

Create some objects:

a = Account.objects.create(name='One')
t = a.ticket_set.create()
t.statuschange_set.create()
t.statuschange_set.create()
a = Account.objects.create(name='Two')
t = a.ticket_set.create()
t.statuschange_set.create()
t.statuschange_set.create()
t.statuschange_set.create()

To summarize, you've created 2 Accounts, each with 1 Ticket, and each Ticket with 2 and 3 StatusChange records per Ticket, respectively.

Now, query for some reporting info:

from datetime import timedelta

from django.utils import timezone
from django.db.models import Max, Count


# Make a Ticket queryset annotated with the last update to a related StatusChange
qs = Ticket.objects.annotate(
    updated_at=Max(
        'statuschange__created_on',
        output_field=models.DateField()
    )
)

# Filter by updated_at (just filter by < now to make sure no records are filtered out)
qs = qs.filter(updated_at__lte=timezone.now())

# Now, count the rows, grouped by Account name
qs.order_by('account__name').values('account__name').annotate(Count('pk'))

The expected output is:

<QuerySet [{'account__name': 'One', 'pk__count': 1}, {'account__name': 'Two', 'pk__count': 1},]>

Since we know that there is just 1 Ticket per Account.

Instead, I'm seeing:

<QuerySet [{'account__name': 'One', 'pk__count': 2}, {'account__name': 'Two', 'pk__count': 3},]>

Because the LEFT OUTER JOIN in the query is causing the second Count annotation to count once per outer-related row.

Change History (2)

comment:1 by Simon Charette, 5 years ago

Resolution: duplicate
Status: newclosed

That's a documented behavior of multiple annotations on multi-valued relationships see #10060 and #28296 which attempts to address it by allowing aggregates to be wrapped in subqueries.

comment:2 by Oroku Saki, 5 years ago

Replying to Simon Charette:

That's a documented behavior of multiple annotations on multi-valued relationships see #10060 and #28296 which attempts to address it by allowing aggregates to be wrapped in subqueries.

ZOMG, I had a huge brain fart today - Count(..., distinct=True) is in prior commits of mine in my own codebase :/

Thank you very much for your reply; you saved me a ton of time. Also, thanks for linking me to those other 2 tickets.

Last edited 5 years ago by Oroku Saki (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top