Opened 6 years ago

Last modified 15 months ago

#15049 new Bug

Using annotation before and after filter gives wrong results

Reported by: Alex Gaynor Owned by:
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: mbmohitbagga88@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Basically:

qs = Book.objects.values("name").annotate(
    n_authors=Count("authors")
).filter(
    authors__name__startswith="Adrian"
).annotate(
    n_authors2=Count("authors")
)

Generates this SQL:

SELECT
    "aggregation_regress_book"."name",
    COUNT("aggregation_regress_book_authors"."author_id") AS "n_authors",
    COUNT("aggregation_regress_book_authors"."author_id") AS "n_authors2"
FROM
   "aggregation_regress_book"
   LEFT OUTER JOIN "aggregation_regress_book_authors" ON ("aggregation_regress_book"."id" = "aggregation_regress_book_authors"."book_id")
   INNER JOIN "aggregation_regress_book_authors" T4 ON ("aggregation_regress_book"."id" = T4."book_id")
   INNER JOIN "aggregation_regress_author" T5 ON (T4."author_id" = T5."id")
WHERE
   T5."name" LIKE Adrian% ESCAPE '\'
GROUP BY
    "aggregation_regress_book"."name",
    "aggregation_regress_book"."name"
ORDER BY
    "aggregation_regress_book"."name" ASC

Which uses the same alias for both COUNTs.

Attachments (2)

django-t15049.diff (1.0 KB) - added by Alex Gaynor 6 years ago.
Tests
15049-test.diff (930 bytes) - added by Tim Graham 16 months ago.

Download all attachments as: .zip

Change History (13)

Changed 6 years ago by Alex Gaynor

Attachment: django-t15049.diff added

Tests

comment:1 Changed 6 years ago by Alex Gaynor

Triage Stage: UnreviewedAccepted

comment:2 Changed 6 years ago by James Addison

milestone: 1.3
Severity: Normal
Type: Bug

comment:3 Changed 5 years ago by Aymeric Augustin

UI/UX: unset

Change UI/UX from NULL to False.

comment:4 Changed 5 years ago by Aymeric Augustin

Easy pickings: unset

Change Easy pickings from NULL to False.

comment:5 Changed 4 years ago by Anssi Kääriäinen

Component: ORM aggregationDatabase layer (models, ORM)

comment:6 Changed 3 years ago by mbmohitbagga88@…

Cc: mbmohitbagga88@… added
Owner: set to anonymous
Status: newassigned

comment:7 Changed 3 years ago by Anssi Kääriäinen

Just a warning - this isn't at all easy to tackle. Basically if the query has more than one "multijoin" then aggregates must be done either as subselects or as subqueries.

An alternate is to just throw an error in such cases. Even this would be better than producing wrong results silently.

comment:8 Changed 3 years ago by terpsquared

Just ran into this. At the very least, the documentation should be updated to reflect this issue.

https://docs.djangoproject.com/en/dev/topics/db/aggregation/#order-of-annotate-and-filter-clauses

comment:9 Changed 16 months ago by Tim Graham

Confirmed it's still an issue as of dad8434d6ff5da10959672726dc9b397296d380b. Attaching a rebased test.

In the interim, documentation patches would of course be welcomed.

Changed 16 months ago by Tim Graham

Attachment: 15049-test.diff added

comment:10 Changed 15 months ago by Tim Graham

Owner: anonymous deleted
Status: assignednew

comment:11 Changed 15 months ago by Anssi Kääriäinen

What is happening here is that:

  • first annotate uses the first join it finds for authors, or generates a new one
  • the second filter call introduces another join on authors
  • the second annotate uses again the first join it finds for authors

It would likely be somewhat easy to make the second annotate use the last introduced join instead of the first join for authors. But, this has a couple of problems. It could break existing code, and it is likely that the results would be incorrect in any case (due to Django not handling multiple multivalued joins + annotate correctly).

I say we don't fix this issue. Instead we could add an explicit way to introduce joins. Something like:

qs = Book.objects.values("name").annotate(
    n_authors=Count("authors")
).add_relation(
   authors2='authors'
).filter(
    authors2__name__startswith="Adrian"
).annotate(
    n_authors2=Count("authors2")
)

Now it is explicit what happens. It is notable that the results are still incorrect. To get correct results something like this might work:

qs = Book.objects.values("name").annotate(
    n_authors=Count("authors")
).annotate(
    n_authors2=Count("authors", subquery=True, filter=Q(authors__name__startswith="Adrian"))
)

Of course, we have a bit more work to do before this is possible. The point is I don't think we can be smart enough to automatically generate the correct query for the report's case.

Note: See TracTickets for help on using tickets.
Back to Top