#34450 closed Bug (fixed)

Lookup expressions across foreign keys introduce extra joins

Reported by: Roman Odaisky Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 4.1
Severity: Normal Keywords:
Cc: David Sanders Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Assuming Child has a ForeignKey to Parent,

Parent.objects.filter(child__x="x", child__y="y")

results in

SELECT ...
FROM "app_parent"
INNER JOIN "app_child" ON ("app_parent"."id" = "app_child"."parent_id")
WHERE ("app_child"."x" = ('x') AND "app_child"."y" = ('y'))

which makes perfect sense.

However,

Parent.objects.filter(Exact(F("child__x"), "x"), child__y="y")

unexpectedly produces

SELECT ...
FROM "app_parent"
LEFT OUTER JOIN "app_child" ON ("app_parent"."id" = "app_child"."parent_id")
INNER JOIN "app_child" T3 ON ("app_parent"."id" = T3."parent_id")
WHERE ("app_child"."x" = ('x') AND T3."y" = ('y'))

basically converting an AND lookup into an OR, which can’t be correct.

Same error here:

Parent.objects.filter(Exact(F("child__x"), "x") & Q(child__y="y"))

But these work correctly:

Parent.objects.filter(Q(child__y="y") & Exact(F("child__x"), "x"))
Parent.objects.filter(Exact(F("child__x"), "x") & Exact(F("child__y"), "y"))

It would seem that all of the above should produce the same SQL as the first invocation instead of introducing extra joins that quietly turn ANDs into ORs, and most definitely there should be no dependence on the order of the operands.

Change History (12)

comment:1 by David Sanders, 20 months ago

Triage Stage: Unreviewed β†’ Accepted

Hi, thanks for the report πŸ‘

Confirmed on latest main that a left outer join is produced.

comment:2 by David Sanders, 20 months ago

Some testing:

Correct joining:

query = Query(Parent)
query.add_q(Q(child__x__exact= 'x'))
query.add_q(Q(Exact(F("child__x"), 'x')))
print(query)

SELECT "ticket_34450_parent"."id" FROM "ticket_34450_parent" INNER JOIN "ticket_34450_child" ON ("ticket_34450_parent"."id" = "ticket_34450_child"."parent_id") WHERE ("ticket_34450_child"."x" = x AND "ticket_34450_child"."x" = (x))

On the other hand with the order reversed:

query = Query(Parent)
query.add_q(Q(Exact(F("child__x"), 'x')))
query.add_q(Q(child__x__exact= 'x'))
print(query)
SELECT "ticket_34450_parent"."id" FROM "ticket_34450_parent" LEFT OUTER JOIN "ticket_34450_child" ON ("ticket_34450_parent"."id" = "ticket_34450_child"."parent_id") INNER JOIN "ticket_34450_child" T3 ON ("ticket_34450_parent"."id" = T3."parent_id") WHERE ("ticket_34450_child"."x" = (x) AND T3."x" = x)

comment:3 by Simon Charette, 20 months ago

QuerySet.filter results ​in a single Query.add_q so I'm not sure that comparing the effect of two distinct add_q call should be the area of focus here as the latter is more alike to filter() chaining calls (e.g. QuerySet.filter().filter()) which has different semantic when dealing with multi-valued relationships.

It appears there is an issue in Query.build_filter when dealing with expressions, possibly the lack of can_reuse/used_aliases passing to Expression.resolve_expression.

Last edited 20 months ago by Simon Charette (previous) (diff)

comment:4 by Simon Charette, 20 months ago

Haven't run the full test suite yet but it seems that intuitively passing reuse so the JOINs are reused does the trick for this particular issue at least

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index 3cb8bdbc8a..1ed7d7cb6e 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -1373,7 +1373,7 @@ def build_filter(
             if not getattr(filter_expr, "conditional", False):
                 raise TypeError("Cannot filter against a non-conditional expression.")
             condition = filter_expr.resolve_expression(
-                self, allow_joins=allow_joins, summarize=summarize
+                self, allow_joins=allow_joins, reuse=can_reuse, summarize=summarize,
             )
             if not isinstance(condition, Lookup):
                 condition = self.build_lookup(["exact"], condition, True)

comment:5 by Simon Charette, 20 months ago

Owner: changed from nobody to Simon Charette
Status: new β†’ assigned

comment:6 by Simon Charette, 20 months ago

Roman, can you confirm ​this PR addresses your issue?

comment:7 by Simon Charette, 20 months ago

Has patch: set

comment:8 by David Sanders, 20 months ago

@Simon

so I'm not sure that comparing the effect of two distinct add_q call should be the area of focus

Sure, I'm not saying that was the area of focus, just that it was some interesting behaviour I saw that Query will try to demote if possible based on existing inner joins (which appears to be done in add_q()) (and that it was worth sharing to whoever might be interested)

Last edited 20 months ago by David Sanders (previous) (diff)

comment:9 by David Sanders, 20 months ago

Oh and it also raises the question of whether

Parent.objects.filter(Exact(F("child__x"), "x")

should be demoted (produce an inner join) because at the moment build_filter() doesn't attempt to find any "needed inners" thus _add_q() doesn't have anything to try to demote to.

πŸ™‚

Version 0, edited 20 months ago by David Sanders (next)

comment:10 by David Sanders, 20 months ago

Cc: David Sanders added

comment:11 by David Sanders, 20 months ago

Just FYI Simon's patch resolves the reported issue:

> print(Parent.objects.filter(Exact(F("child__x"), "x"), child__y="y").query)

SELECT "ticket_34450_parent"."id" FROM "ticket_34450_parent" INNER JOIN "ticket_34450_child" ON ("ticket_34450_parent"."id" = "ticket_34450_child"."parent_id") WHERE ("ticket_34450_child"."x" = (x) AND "ticket_34450_child"."y" = y)

comment:12 by Mariusz Felisiak <felisiak.mariusz@…>, 20 months ago

Resolution: β†’ fixed
Status: assigned β†’ closed

In 0e1aae7a:

Fixed #34450 -- Fixed multi-valued JOIN reuse when filtering by expressions.

Thanks Roman Odaisky for the report.

Note: See TracTickets for help on using tickets.
Back to Top