Opened 3 years ago

Closed 3 years ago

Last modified 2 years ago

#33403 closed Cleanup/optimization (duplicate)

Annotate results change when filtering *after* the annotate — at Version 5

Reported by: karyon Owned by: nobody
Component: Database layer (models, ORM) Version: 4.0
Severity: Normal Keywords:
Cc: karyon Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by karyon)

Publisher A has two books with each 100 pages.

Publisher.objects.annotate(Sum("books__pages"))

will give you 200 as the sum. However, adding a filter after the annotate

Publisher.objects.annotate(Sum("books__pages")).filter(books__in=Books.objects.all())


will give you 400. (apparently that's the correct sum 200 multiplied by the number of books of that publisher)

I understand that joins in annotates can produce incorrect results, akin to the one documented here: #combining-multiple-aggregations. However, the docs there say only "Combining multiple aggregations with annotate() will yield the wrong results", and here I'm not combining multiple aggregations. Furthermore, #order-of-annotate-and-filter-clauses says "When an annotate() clause is applied to a query, the annotation is computed over the state of the query up to the point where the annotation is requested.", which further made me believe this should actually work.

Change History (5)

comment:1 by karyon, 3 years ago

Cc: karyon added
Description: modified (diff)

comment:2 by karyon, 3 years ago

Description: modified (diff)

comment:3 by Simon Charette, 3 years ago

Resolution: invalid
Status: newclosed

While the documentation doesn't explicit mention this particular combination you are effectively hitting #10060 where the book table is joined twice instead of a single time and used for aggregation and filtering.

In order to avoid this problem you can use a FilteredRelation alias or simply the condition argument of Sum

Publisher.objects.annotate(Sum("books__pages", condition=Q(books__in=Book.objects.all()))

Publisher.objects.alias(
    filtered_books=FilteredRelation("books", condition=Q(books__in=Book.objects.all()))
).annotate(
    Sum("filtered_books__pages"),
)

Please TicketClosingReasons/UseSupportChannels in the future for this kind of questions

---

I kind of wish we made FiltereRelation(relation: str, condition: Q) Relation(relation: str, condition: Optional[Q]) instead and documented its usage to circumvent this problem with the use of alias to tell the ORM a JOIN must be reused.

Last edited 3 years ago by Simon Charette (previous) (diff)

comment:4 by karyon, 3 years ago

Hi Simon, I'm sorry if I was unclear. I didn't intend this to be a question, I already have a workaround in place. This was intended to be a bug report: Two sections/sentences in the documentation are incorrect as to which cases should or should not work, see the last paragraph of my report.

To expand on that, I think the entire section "combining multiple aggregations" should be generalized, since it's not only a second annotation that triggers the issue, but (apparently?) any operation that produces another join, such as filtering by an attribute in a related model. In addition, the sentence "the annotation is computed over the state of the query up to the point where the annotation is requested" is misleading: A subsequent filter or additional aggregation can alter the results of a previous one. It could help users if we added something like "except for these cases <link to a generalized 'combining multiple aggregations' section>".

Your suggested solution is not necessary in this case, simply filtering before the annotation does work correctly as far as I could see:

# works as expected, sums the filtered books
Publisher.objects.filter(books__in=Books.objects.all()).annotate(Sum("books_pages")) 

What does not work correctly is filtering the publishers by some property of their books *after* the aggregation without affecting the aggregation:

# this alters the sum although the documentation suggests this should just work
Publisher.objects.annotate(Sum("books__pages")).filter(books__in=Books.objects.all()) 
Last edited 3 years ago by karyon (previous) (diff)

comment:5 by karyon, 3 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top