Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#31535 closed Bug (invalid)

Calling annotate with a case statement that references a recursive many to many relationship creates duplicates.

Reported by: David Cooke Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by David Cooke)

So I have a Challenge model that has an unlocks field which is a recursive, asymmetric many to many field with a reverse name of unlocked_by. I call annotate on it to add an "unlocked" field which checks if the challenges unlocked_by contains any ids in a list of solved challenges, doing this inserts all the solved challenges into the queryset that annotate is being called on. There's an example of this here: https://github.com/david-cooke/djangobug/blob/master/bug/tests.py

The sql being run is

SELECT "bug_challenge"."id", "bug_challenge"."name", CASE WHEN "bug_challenge_unlocks"."from_challenge_id" IN (5, 1) THEN True ELSE False END AS "unlocked" FROM "bug_challenge" LEFT OUTER JOIN "bug_challenge_unlocks" ON ("bug_challenge"."id" = "bug_challenge_unlocks"."to_challenge_id")

so it seems like the left outer join is adding the duplicates although thats needed for the case statement.

It is possible to work around this by calling value_list(flat=True).distinct('pk') however isn't compatible with more complicated examples such as https://github.com/ractf/core/blob/master/challenge/views.py#L50

Change History (3)

comment:1 by David Cooke, 5 years ago

Description: modified (diff)

comment:2 by Mariusz Felisiak, 5 years ago

Resolution: invalid
Status: newclosed
Summary: Calling annotate with a case statement that references a recursive many to many relationship on the same model as the queryset will create duplicatesCalling annotate with a case statement that references a recursive many to many relationship creates duplicates.

This is documented behavior (see the warning in the QuerySet.values() docs):

Because ManyToManyField attributes and reverse relations can have multiple related rows, including these can have a multiplier effect on the size of your result set. This will be especially pronounced if you include multiple such fields in your values() query, in which case all possible combinations will be returned.

You can use Subquery() or .distinct(...) to avoid duplicate.

comment:3 by Simon Charette, 5 years ago

The ORM is able to optimize .filter(unlocked_by__in=solved_challenges) into an INNER JOIN but it cannot do the same for annotations because they could be doing some special handling of NULL and it would require some pretty sophisticated introspection to determine it's safe to do. It would also be backward incompatible at this point. You should rely on Subquery or even Exists in your reported case instead.

Last edited 5 years ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top