Opened 3 years ago
Closed 2 years ago
#33403 closed Cleanup/optimization (duplicate)
Annotate results change when filtering *after* the annotate
Reported by: | karyon | Owned by: | Abhinav Yadav |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.0 |
Severity: | Normal | Keywords: | |
Cc: | karyon | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
Publisher A has two books with each 100 pages.
Publisher.objects.annotate(Sum("books__pages"))
will give you 200 as the sum. However, adding a filter after the annotate
Publisher.objects.annotate(Sum("books__pages")).filter(books__in=Books.objects.all())
will give you 400. (apparently that's the correct sum 200 multiplied by the number of books of that publisher)
I understand that joins in annotates can produce incorrect results, akin to the one documented here: #combining-multiple-aggregations. However, the docs there say only "Combining multiple aggregations with annotate() will yield the wrong results", and here I'm not combining multiple aggregations. Furthermore, #order-of-annotate-and-filter-clauses says "When an annotate() clause is applied to a query, the annotation is computed over the state of the query up to the point where the annotation is requested.", which further made me believe this should actually work.
Change History (12)
comment:1 by , 3 years ago
Cc: | added |
---|---|
Description: | modified (diff) |
comment:2 by , 3 years ago
Description: | modified (diff) |
---|
comment:3 by , 3 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:4 by , 3 years ago
Hi Simon, I'm sorry if I was unclear. I didn't intend this to be a question, I already have a workaround in place. This was intended to be a bug report: Two sections/sentences in the documentation are incorrect as to which cases should or should not work, see the last paragraph of my report.
To expand on that, I think the entire section "combining multiple aggregations" should be generalized, since it's not only a second annotation that triggers the issue, but (apparently?) any operation that produces another join, such as filtering by an attribute in a related model. In addition, the sentence "the annotation is computed over the state of the query up to the point where the annotation is requested" is misleading: A subsequent filter or additional aggregation can alter the results of a previous one. It could help users if we added something like "except for these cases <link to a generalized 'combining multiple aggregations' section>".
Your suggested solution is not necessary in this case, simply filtering before the annotation does work correctly as far as I could see:
# works as expected, sums the filtered books Publisher.objects.filter(books__in=Books.objects.all()).annotate(Sum("books_pages"))
What does not work correctly is filtering the publishers by some property of their books *after* the aggregation without affecting the aggregation:
# this alters the sum although the documentation suggests this should just work Publisher.objects.annotate(Sum("books__pages")).filter(books__in=Books.objects.all())
comment:5 by , 3 years ago
Description: | modified (diff) |
---|
comment:6 by , 3 years ago
Resolution: | invalid |
---|---|
Status: | closed → new |
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Cleanup/optimization |
While I'm not sure how to word this properly myself I also agree that the the annotation is computed over the state of the query up to the point where the annotation is requested statement can be misleading without context about how multi-valued filtering behaves.
Given we've recently revised the latter's documentation (6174814dbe04fb6668aa212a6cdbca765a8b0522) I'm re-opening this ticket to investigate doing the same here.
Maybe a single addition to docs about how multi-valued filtering against a previously annotated aggregation might still yield duplicate results would be enough?
comment:7 by , 2 years ago
Has patch: | set |
---|---|
Needs documentation: | set |
Owner: | changed from | to
Status: | new → assigned |
comment:8 by , 2 years ago
Needs documentation: | unset |
---|
comment:9 by , 2 years ago
Patch needs improvement: | set |
---|
comment:10 by , 2 years ago
Patch needs improvement: | unset |
---|
follow-up: 12 comment:11 by , 2 years ago
FWIW this seems highly related if not a duplicate of #15049.
comment:12 by , 2 years ago
Resolution: | → duplicate |
---|---|
Status: | assigned → closed |
Replying to Simon Charette:
FWIW this seems highly related if not a duplicate of #15049.
Agreed, let's close it as a duplicate of #15049.
While the documentation doesn't explicit mention this particular combination you are effectively hitting #10060 where the
book
table is joined twice instead of a single time and used for aggregation and filtering.In order to avoid this problem you can use a
FilteredRelation
alias or simply thecondition
argument ofSum
Please TicketClosingReasons/UseSupportChannels in the future for this kind of questions
---
I kind of wish we made
FiltereRelation(relation: str, condition: Q)
Relation(relation: str, condition: Optional[Q])
instead and documented its usage to circumvent this problem with the use ofalias
to tell the ORM a JOIN must be reused.