Opened 3 years ago

Closed 2 years ago

#33403 closed Cleanup/optimization (duplicate)

Annotate results change when filtering *after* the annotate

Reported by: karyon Owned by: Abhinav Yadav
Component: Database layer (models, ORM) Version: 4.0
Severity: Normal Keywords:
Cc: karyon Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by karyon)

Publisher A has two books with each 100 pages.

Publisher.objects.annotate(Sum("books__pages"))

will give you 200 as the sum. However, adding a filter after the annotate

Publisher.objects.annotate(Sum("books__pages")).filter(books__in=Books.objects.all())


will give you 400. (apparently that's the correct sum 200 multiplied by the number of books of that publisher)

I understand that joins in annotates can produce incorrect results, akin to the one documented here: #combining-multiple-aggregations. However, the docs there say only "Combining multiple aggregations with annotate() will yield the wrong results", and here I'm not combining multiple aggregations. Furthermore, #order-of-annotate-and-filter-clauses says "When an annotate() clause is applied to a query, the annotation is computed over the state of the query up to the point where the annotation is requested.", which further made me believe this should actually work.

Change History (12)

comment:1 by karyon, 3 years ago

Cc: karyon added
Description: modified (diff)

comment:2 by karyon, 3 years ago

Description: modified (diff)

comment:3 by Simon Charette, 3 years ago

Resolution: invalid
Status: newclosed

While the documentation doesn't explicit mention this particular combination you are effectively hitting #10060 where the book table is joined twice instead of a single time and used for aggregation and filtering.

In order to avoid this problem you can use a FilteredRelation alias or simply the condition argument of Sum

Publisher.objects.annotate(Sum("books__pages", condition=Q(books__in=Book.objects.all()))

Publisher.objects.alias(
    filtered_books=FilteredRelation("books", condition=Q(books__in=Book.objects.all()))
).annotate(
    Sum("filtered_books__pages"),
)

Please TicketClosingReasons/UseSupportChannels in the future for this kind of questions

---

I kind of wish we made FiltereRelation(relation: str, condition: Q) Relation(relation: str, condition: Optional[Q]) instead and documented its usage to circumvent this problem with the use of alias to tell the ORM a JOIN must be reused.

Last edited 3 years ago by Simon Charette (previous) (diff)

comment:4 by karyon, 3 years ago

Hi Simon, I'm sorry if I was unclear. I didn't intend this to be a question, I already have a workaround in place. This was intended to be a bug report: Two sections/sentences in the documentation are incorrect as to which cases should or should not work, see the last paragraph of my report.

To expand on that, I think the entire section "combining multiple aggregations" should be generalized, since it's not only a second annotation that triggers the issue, but (apparently?) any operation that produces another join, such as filtering by an attribute in a related model. In addition, the sentence "the annotation is computed over the state of the query up to the point where the annotation is requested" is misleading: A subsequent filter or additional aggregation can alter the results of a previous one. It could help users if we added something like "except for these cases <link to a generalized 'combining multiple aggregations' section>".

Your suggested solution is not necessary in this case, simply filtering before the annotation does work correctly as far as I could see:

# works as expected, sums the filtered books
Publisher.objects.filter(books__in=Books.objects.all()).annotate(Sum("books_pages")) 

What does not work correctly is filtering the publishers by some property of their books *after* the aggregation without affecting the aggregation:

# this alters the sum although the documentation suggests this should just work
Publisher.objects.annotate(Sum("books__pages")).filter(books__in=Books.objects.all()) 
Last edited 3 years ago by karyon (previous) (diff)

comment:5 by karyon, 3 years ago

Description: modified (diff)

comment:6 by Simon Charette, 3 years ago

Resolution: invalid
Status: closednew
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

While I'm not sure how to word this properly myself I also agree that the the annotation is computed over the state of the query up to the point where the annotation is requested statement can be misleading without context about how multi-valued filtering behaves.

Given we've recently revised the latter's documentation (6174814dbe04fb6668aa212a6cdbca765a8b0522) I'm re-opening this ticket to investigate doing the same here.

Maybe a single addition to docs about how multi-valued filtering against a previously annotated aggregation might still yield duplicate results would be enough?

comment:7 by Abhinav Yadav, 2 years ago

Has patch: set
Needs documentation: set
Owner: changed from nobody to Abhinav Yadav
Status: newassigned

comment:8 by Mariusz Felisiak, 2 years ago

Needs documentation: unset

comment:9 by Mariusz Felisiak, 2 years ago

Patch needs improvement: set

comment:10 by Abhinav Yadav, 2 years ago

Patch needs improvement: unset

comment:11 by Simon Charette, 2 years ago

FWIW this seems highly related if not a duplicate of #15049.

Last edited 2 years ago by Simon Charette (previous) (diff)

in reply to:  11 comment:12 by Mariusz Felisiak, 2 years ago

Resolution: duplicate
Status: assignedclosed

Replying to Simon Charette:

FWIW this seems highly related if not a duplicate of #15049.

Agreed, let's close it as a duplicate of #15049.

Note: See TracTickets for help on using tickets.
Back to Top