Opened 13 years ago

Last modified 3 months ago

#15049 assigned Bug

Using annotation before and after filter gives wrong results

Reported by: Alex Gaynor Owned by: Abhinav Yadav
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: mbmohitbagga88@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Basically:

qs = Book.objects.values("name").annotate(
    n_authors=Count("authors")
).filter(
    authors__name__startswith="Adrian"
).annotate(
    n_authors2=Count("authors")
)

Generates this SQL:

SELECT
    "aggregation_regress_book"."name",
    COUNT("aggregation_regress_book_authors"."author_id") AS "n_authors",
    COUNT("aggregation_regress_book_authors"."author_id") AS "n_authors2"
FROM
   "aggregation_regress_book"
   LEFT OUTER JOIN "aggregation_regress_book_authors" ON ("aggregation_regress_book"."id" = "aggregation_regress_book_authors"."book_id")
   INNER JOIN "aggregation_regress_book_authors" T4 ON ("aggregation_regress_book"."id" = T4."book_id")
   INNER JOIN "aggregation_regress_author" T5 ON (T4."author_id" = T5."id")
WHERE
   T5."name" LIKE Adrian% ESCAPE '\'
GROUP BY
    "aggregation_regress_book"."name",
    "aggregation_regress_book"."name"
ORDER BY
    "aggregation_regress_book"."name" ASC

Which uses the same alias for both COUNTs.

Attachments (2)

django-t15049.diff (1.0 KB ) - added by Alex Gaynor 13 years ago.
Tests
15049-test.diff (930 bytes ) - added by Tim Graham 9 years ago.

Download all attachments as: .zip

Change History (17)

by Alex Gaynor, 13 years ago

Attachment: django-t15049.diff added

Tests

comment:1 by Alex Gaynor, 13 years ago

Triage Stage: UnreviewedAccepted

comment:2 by James Addison, 13 years ago

milestone: 1.3
Severity: Normal
Type: Bug

comment:3 by Aymeric Augustin, 12 years ago

UI/UX: unset

Change UI/UX from NULL to False.

comment:4 by Aymeric Augustin, 12 years ago

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:5 by Anssi Kääriäinen, 11 years ago

Component: ORM aggregationDatabase layer (models, ORM)

comment:6 by mbmohitbagga88@…, 11 years ago

Cc: mbmohitbagga88@… added
Owner: set to anonymous
Status: newassigned

comment:7 by Anssi Kääriäinen, 11 years ago

Just a warning - this isn't at all easy to tackle. Basically if the query has more than one "multijoin" then aggregates must be done either as subselects or as subqueries.

An alternate is to just throw an error in such cases. Even this would be better than producing wrong results silently.

comment:8 by terpsquared, 10 years ago

Just ran into this. At the very least, the documentation should be updated to reflect this issue.

https://docs.djangoproject.com/en/dev/topics/db/aggregation/#order-of-annotate-and-filter-clauses

comment:9 by Tim Graham, 9 years ago

Confirmed it's still an issue as of dad8434d6ff5da10959672726dc9b397296d380b. Attaching a rebased test.

In the interim, documentation patches would of course be welcomed.

by Tim Graham, 9 years ago

Attachment: 15049-test.diff added

comment:10 by Tim Graham, 8 years ago

Owner: anonymous removed
Status: assignednew

comment:11 by Anssi Kääriäinen, 8 years ago

What is happening here is that:

  • first annotate uses the first join it finds for authors, or generates a new one
  • the second filter call introduces another join on authors
  • the second annotate uses again the first join it finds for authors

It would likely be somewhat easy to make the second annotate use the last introduced join instead of the first join for authors. But, this has a couple of problems. It could break existing code, and it is likely that the results would be incorrect in any case (due to Django not handling multiple multivalued joins + annotate correctly).

I say we don't fix this issue. Instead we could add an explicit way to introduce joins. Something like:

qs = Book.objects.values("name").annotate(
    n_authors=Count("authors")
).add_relation(
   authors2='authors'
).filter(
    authors2__name__startswith="Adrian"
).annotate(
    n_authors2=Count("authors2")
)

Now it is explicit what happens. It is notable that the results are still incorrect. To get correct results something like this might work:

qs = Book.objects.values("name").annotate(
    n_authors=Count("authors")
).annotate(
    n_authors2=Count("authors", subquery=True, filter=Q(authors__name__startswith="Adrian"))
)

Of course, we have a bit more work to do before this is possible. The point is I don't think we can be smart enough to automatically generate the correct query for the report's case.

comment:12 by Simon Charette, 16 months ago

I say we don't fix this issue. Instead we could add an explicit way to introduce joins.

fwiw this add_relation somewhat exists today though QuerySet.alias(alias=FilteredRelation("relation"))

comment:13 by Abhinav Yadav, 16 months ago

PR

PR documenting this irregular behavior from the duplicate ticket #33403

comment:14 by Abhinav Yadav, 3 months ago

Has patch: set

comment:15 by Mariusz Felisiak, 3 months ago

Owner: set to Abhinav Yadav
Patch needs improvement: set
Status: newassigned
Note: See TracTickets for help on using tickets.
Back to Top