Opened 21 months ago

Closed 21 months ago

Last modified 21 months ago

#33929 closed Bug (duplicate)

Field Reference in FilteredRelation Does Not Recognize Previously Defined FilteredRelation

Reported by: Matt Owned by: nobody
Component: Database layer (models, ORM) Version: 4.1
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I suspect this may be the same root cause as https://code.djangoproject.com/ticket/33766, but the use-case here is different enough I thought I'd log a new ticket.

All this is using Django 4.0 or 4.1, on PostgreSQL. I confess that I have not checked if other DB layers might generate correct SQL.

It appears that I cannot reference one FilteredRelation from another's condition without jumping through some hoops. Starting with the following example models:

from django.db import models

class A(models.Model):
    ...

class B(models.Model):
    a = models.ForeignKey("A", on_delete=models.CASCADE)
    complete = models.BooleanField(default=False)

class C(models.Model):
    a = models.ForeignKey("A", on_delete=models.CASCADE)
    b = models.OneToOneField("B", blank=True, null=True, on_delete=models.CASCADE)
    complete = models.BooleanField(default=False)

Now suppose that I want a count of incomplete B, and also incomplete C, but only when related to an incomplete B.
If I were writing SQL myself, I’d write this as:

SELECT COUNT(b.id) as b_count, COUNT(c.id) as c_count
FROM a
LEFT JOIN b ON b.a_id = a.id AND NOT b.complete
LEFT JOIN c ON c.a_id = a.id AND c.b_id = b.id AND NOT c.complete

Now, the below queryset very nearly works:

A.objects.annotate(
    binc=FilteredRelation("b", condition=Q(b__complete=False)),
    cinc=FilteredRelation("c", condition=Q(c__b=F("binc__pk"), c__complete=False)),
    b_count=Count("binc"),
    c_count=Count("cinc"),
)

Unfortunately this uses an incorrect table alias into the cinc FilteredRelation, where I tried to reference F("binc__pk"). If I try to execute it, I get

django.db.utils.ProgrammingError: missing FROM-clause entry for table "t4"
LINE 1: ...("a"."id" = cinc."a_id" AND ((cinc."b_id" = (T4."id") A…

There is a workaround: I can force the correct identifier using RawSQL, and use this, which provides correct results:

A.objects.annotate(
    binc=FilteredRelation("b", condition=Q(b__complete=False)),
    cinc=FilteredRelation("c", condition=Q(c__b=RawSQL("binc.id", ()), c__complete=False)),
    b_count=Count("binc"),
    c_count=Count("cinc"),
)

Change History (3)

comment:1 by Nick Vellios, 21 months ago

I hope to add some more context after looking this over. It appears as though Matt is correct regarding being related to https://code.djangoproject.com/ticket/33766. The fields are not being properly aliased.

App name in my tests is interface which I left references of in the console output, but I removed the interface_ prefix from the formatted SQL queries for readability.

>>> qs = A.objects.annotate(
...     binc=FilteredRelation("b", condition=Q(b__complete=False)),
...     cinc=FilteredRelation("c", condition=Q(c__b=F("binc__pk"), c__complete=False)),
...     b_count=Count("binc"),
...     c_count=Count("cinc"),
... )
>>> qs.query.alias_map
{'interface_a': <django.db.models.sql.datastructures.BaseTable object at 0x11095fa60>, 'binc': <django.db.models.sql.datastructures.Join object at 0x11095c6a0>, 'cinc': <django.db.models.sql.datastructures.Join object at 0x11095de10>}

Invoking the query compiler adds the naive alias T4 to the query which also later shows up in the Query.alias_map.

>>> str(qs.query)

SELECT
  "a"."id",
  COUNT(binc."id") AS "b_count",
  COUNT(cinc."id") AS "c_count"
FROM "a"
LEFT OUTER JOIN "b" binc
  ON ("a"."id" = binc."a_id"
  AND (NOT binc."complete"))
LEFT OUTER JOIN "c" cinc
  ON ("a"."id" = cinc."a_id"
  AND ((cinc."b_id" = (T4."id")
  AND NOT cinc."complete")))
GROUP BY "a"."id"

Notice the incrementing Tn on the Query.alias_map but only the last one is referenced in the resulting query:

>>> qs.query.alias_map
{'interface_a': <django.db.models.sql.datastructures.BaseTable object at 0x11095fa60>, 'binc': <django.db.models.sql.datastructures.Join object at 0x11095c6a0>, 'cinc': <django.db.models.sql.datastructures.Join object at 0x11095de10>, 'T4': <django.db.models.sql.dat
astructures.Join object at 0x1077991b0>, 'T5': <django.db.models.sql.datastructures.Join object at 0x10778b400>}

>>> str(qs.query)
'SELECT "interface_a"."id", COUNT(binc."id") AS "b_count", COUNT(cinc."id") AS "c_count" FROM "interface_a" LEFT OUTER JOIN "interface_b" binc ON ("interface_a"."id" = binc."a_id" AND (NOT binc."complete")) LEFT OUTER JOIN "interface_c" cinc ON ("interface_a"."id" =
 cinc."a_id" AND ((cinc."b_id" = (T6."id") AND NOT cinc."complete"))) GROUP BY "interface_a"."id"'

>>> str(qs.query)
'SELECT "interface_a"."id", COUNT(binc."id") AS "b_count", COUNT(cinc."id") AS "c_count" FROM "interface_a" LEFT OUTER JOIN "interface_b" binc ON ("interface_a"."id" = binc."a_id" AND (NOT binc."complete")) LEFT OUTER JOIN "interface_c" cinc ON ("interface_a"."id" =
 cinc."a_id" AND ((cinc."b_id" = (T8."id") AND NOT cinc."complete"))) GROUP BY "interface_a"."id"'

>>> qs.query.alias_map
{'interface_a': <django.db.models.sql.datastructures.BaseTable object at 0x11095fa60>, 'binc': <django.db.models.sql.datastructures.Join object at 0x11095c6a0>, 'cinc': <django.db.models.sql.datastructures.Join object at 0x11095de10>, 'T4': <django.db.models.sql.dat
astructures.Join object at 0x1077991b0>, 'T5': <django.db.models.sql.datastructures.Join object at 0x10778b400>, 'T6': <django.db.models.sql.datastructures.Join object at 0x11099b640>, 'T7': <django.db.models.sql.datastructures.Join object at 0x110834580>, 'T8': <dj
ango.db.models.sql.datastructures.Join object at 0x110ad0610>, 'T9': <django.db.models.sql.datastructures.Join object at 0x110ad03a0>}


>>> str(qs.query)

SELECT
  "a"."id",
  COUNT(binc."id") AS "b_count",
  COUNT(cinc."id") AS "c_count"
FROM "a"
LEFT OUTER JOIN "b" binc
  ON ("a"."id" = binc."a_id"
  AND (NOT binc."complete"))
LEFT OUTER JOIN "c" cinc
  ON ("a"."id" = cinc."a_id"
  AND ((cinc."b_id" = (T9"id")
  AND NOT cinc."complete")))
GROUP BY "a"."id"

The following works but selects and groups by one additional field binc_pk. Postgres Query Planner output included. On a larger queryset this could get expensive.

A.objects.annotate(
    binc=FilteredRelation("b", condition=Q(b__complete=False)),
    binc_pk=F('binc__pk'),  # <-- Allows elimination of relying on raw SQL, however...
    cinc=FilteredRelation("c", condition=Q(c__b=F('binc_pk'), c__complete=False)),
    b_count=Count("binc"),
    c_count=Count("cinc"),
)
SELECT
  "a"."id",
  binc."id" AS "binc_pk",  /* <-- Ugly */
  COUNT(binc."id") AS "b_count",
  COUNT(cinc."id") AS "c_count"
FROM "a"
LEFT OUTER JOIN "b" binc
  ON ("a"."id" = binc."a_id"
  AND (NOT binc."complete"))
LEFT OUTER JOIN "c" cinc
  ON ("a"."id" = cinc."a_id"
  AND ((cinc."b_id" = (binc."id")
  AND NOT cinc."complete")))
GROUP BY "a"."id",
  binc."id"  /* <-- Ugly */

Not ideal.

➜  ~ psql -d dj_issue_33929
psql (14.4)
Type "help" for help.

dj_issue_33929=# EXPLAIN SELECT
dj_issue_33929-#   "interface_a"."id",
dj_issue_33929-#   binc."id" AS "binc_pk",
dj_issue_33929-#   COUNT(binc."id") AS "b_count",
dj_issue_33929-#   COUNT(cinc."id") AS "c_count"
dj_issue_33929-# FROM "interface_a"
dj_issue_33929-# LEFT OUTER JOIN "interface_b" binc
dj_issue_33929-#   ON ("interface_a"."id" = binc."a_id"
dj_issue_33929(#   AND (NOT binc."complete"))
dj_issue_33929-# LEFT OUTER JOIN "interface_c" cinc
dj_issue_33929-#   ON ("interface_a"."id" = cinc."a_id"
dj_issue_33929(#   AND ((cinc."b_id" = (binc."id")
dj_issue_33929(#   AND NOT cinc."complete")))
dj_issue_33929-# GROUP BY "interface_a"."id",
dj_issue_33929-#          binc."id";
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 HashAggregate  (cost=162.74..185.34 rows=2260 width=32)
   Group Key: interface_a.id, binc.id
   ->  Hash Left Join  (cost=97.80..140.14 rows=2260 width=24)
         Hash Cond: ((interface_a.id = cinc.a_id) AND (binc.id = cinc.b_id))
         ->  Hash Right Join  (cost=60.85..91.33 rows=2260 width=16)
               Hash Cond: (binc.a_id = interface_a.id)
               ->  Seq Scan on interface_b binc  (cost=0.00..28.10 rows=905 width=16)
                     Filter: (NOT complete)
               ->  Hash  (cost=32.60..32.60 rows=2260 width=8)
                     ->  Seq Scan on interface_a  (cost=0.00..32.60 rows=2260 width=8)
         ->  Hash  (cost=25.40..25.40 rows=770 width=24)
               ->  Seq Scan on interface_c cinc  (cost=0.00..25.40 rows=770 width=24)
                     Filter: (NOT complete)
(13 rows)

dj_issue_33929=# EXPLAIN SELECT
dj_issue_33929-#   "interface_a"."id",
dj_issue_33929-#   COUNT(binc."id") AS "b_count",
dj_issue_33929-#   COUNT(cinc."id") AS "c_count"
dj_issue_33929-# FROM "interface_a"
dj_issue_33929-# LEFT OUTER JOIN "interface_b" binc
dj_issue_33929-#   ON ("interface_a"."id" = binc."a_id"
dj_issue_33929(#   AND (NOT binc."complete"))
dj_issue_33929-# LEFT OUTER JOIN "interface_c" cinc
dj_issue_33929-#   ON ("interface_a"."id" = cinc."a_id"
dj_issue_33929(#   AND ((cinc."b_id" = (binc.id)
dj_issue_33929(#   AND NOT cinc."complete")))
dj_issue_33929-# GROUP BY "interface_a"."id";
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 HashAggregate  (cost=157.09..179.69 rows=2260 width=24)
   Group Key: interface_a.id
   ->  Hash Left Join  (cost=97.80..140.14 rows=2260 width=24)
         Hash Cond: ((interface_a.id = cinc.a_id) AND (binc.id = cinc.b_id))
         ->  Hash Right Join  (cost=60.85..91.33 rows=2260 width=16)
               Hash Cond: (binc.a_id = interface_a.id)
               ->  Seq Scan on interface_b binc  (cost=0.00..28.10 rows=905 width=16)
                     Filter: (NOT complete)
               ->  Hash  (cost=32.60..32.60 rows=2260 width=8)
                     ->  Seq Scan on interface_a  (cost=0.00..32.60 rows=2260 width=8)
         ->  Hash  (cost=25.40..25.40 rows=770 width=24)
               ->  Seq Scan on interface_c cinc  (cost=0.00..25.40 rows=770 width=24)
                     Filter: (NOT complete)
(13 rows)

Instinctively I looked into the implementation of OuterRef for ideas, but the implementation is quite contrasting to FilteredRelation.

comment:2 by Mariusz Felisiak, 21 months ago

Resolution: duplicate
Status: newclosed

Thanks for the report! Agree, this is a duplicate of #33929.

comment:3 by Simon Charette, 21 months ago

I agree with Mariusz and I think this has the exact same root cause as #33766 and detailed in this comment.

The gist of it is that filter relations conditions are resolved very late in the process of SQL compilation (basically when the FROM clause is generated) and thus the resolving of references to filtered relation induced joins alias can be off.

Note: See TracTickets for help on using tickets.
Back to Top