Opened 14 years ago
Last modified 11 months ago
#15049 assigned Bug
Using annotation before and after filter gives wrong results
Reported by: | Alex Gaynor | Owned by: | Abhinav Yadav |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | mbmohitbagga88@… | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Basically:
qs = Book.objects.values("name").annotate( n_authors=Count("authors") ).filter( authors__name__startswith="Adrian" ).annotate( n_authors2=Count("authors") )
Generates this SQL:
SELECT "aggregation_regress_book"."name", COUNT("aggregation_regress_book_authors"."author_id") AS "n_authors", COUNT("aggregation_regress_book_authors"."author_id") AS "n_authors2" FROM "aggregation_regress_book" LEFT OUTER JOIN "aggregation_regress_book_authors" ON ("aggregation_regress_book"."id" = "aggregation_regress_book_authors"."book_id") INNER JOIN "aggregation_regress_book_authors" T4 ON ("aggregation_regress_book"."id" = T4."book_id") INNER JOIN "aggregation_regress_author" T5 ON (T4."author_id" = T5."id") WHERE T5."name" LIKE Adrian% ESCAPE '\' GROUP BY "aggregation_regress_book"."name", "aggregation_regress_book"."name" ORDER BY "aggregation_regress_book"."name" ASC
Which uses the same alias for both COUNTs.
Attachments (2)
Change History (17)
by , 14 years ago
Attachment: | django-t15049.diff added |
---|
comment:1 by , 14 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:2 by , 14 years ago
milestone: | 1.3 |
---|---|
Severity: | → Normal |
Type: | → Bug |
comment:5 by , 12 years ago
Component: | ORM aggregation → Database layer (models, ORM) |
---|
comment:6 by , 11 years ago
Cc: | added |
---|---|
Owner: | set to |
Status: | new → assigned |
comment:7 by , 11 years ago
Just a warning - this isn't at all easy to tackle. Basically if the query has more than one "multijoin" then aggregates must be done either as subselects or as subqueries.
An alternate is to just throw an error in such cases. Even this would be better than producing wrong results silently.
comment:8 by , 11 years ago
Just ran into this. At the very least, the documentation should be updated to reflect this issue.
https://docs.djangoproject.com/en/dev/topics/db/aggregation/#order-of-annotate-and-filter-clauses
comment:9 by , 9 years ago
Confirmed it's still an issue as of dad8434d6ff5da10959672726dc9b397296d380b. Attaching a rebased test.
In the interim, documentation patches would of course be welcomed.
by , 9 years ago
Attachment: | 15049-test.diff added |
---|
comment:10 by , 9 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:11 by , 9 years ago
What is happening here is that:
- first annotate uses the first join it finds for authors, or generates a new one
- the second filter call introduces another join on authors
- the second annotate uses again the first join it finds for authors
It would likely be somewhat easy to make the second annotate use the last introduced join instead of the first join for authors. But, this has a couple of problems. It could break existing code, and it is likely that the results would be incorrect in any case (due to Django not handling multiple multivalued joins + annotate correctly).
I say we don't fix this issue. Instead we could add an explicit way to introduce joins. Something like:
qs = Book.objects.values("name").annotate( n_authors=Count("authors") ).add_relation( authors2='authors' ).filter( authors2__name__startswith="Adrian" ).annotate( n_authors2=Count("authors2") )
Now it is explicit what happens. It is notable that the results are still incorrect. To get correct results something like this might work:
qs = Book.objects.values("name").annotate( n_authors=Count("authors") ).annotate( n_authors2=Count("authors", subquery=True, filter=Q(authors__name__startswith="Adrian")) )
Of course, we have a bit more work to do before this is possible. The point is I don't think we can be smart enough to automatically generate the correct query for the report's case.
comment:12 by , 2 years ago
I say we don't fix this issue. Instead we could add an explicit way to introduce joins.
fwiw this add_relation
somewhat exists today though QuerySet.alias(alias=FilteredRelation("relation"))
comment:13 by , 2 years ago
comment:14 by , 11 months ago
Has patch: | set |
---|
comment:15 by , 11 months ago
Owner: | set to |
---|---|
Patch needs improvement: | set |
Status: | new → assigned |
Tests