Opened 12 months ago

Closed 10 months ago

Last modified 2 days ago

#36025 closed Bug (fixed)

__range lookup in conditional aggregate with subquery annotation does not use annotated related fields

Reported by: Aashay Amballi Owned by: Simon Charette
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords: ORM
Cc: Aashay Amballi, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I'm encountering an issue with a Django ORM operation that uses the __range filter on related fields. Here is the relevant model setup and operation:

class Project(models.Model):
    name = models.CharField(max_length=100)
    description = models.TextField()
    start_date = models.DateField()
    end_date = models.DateField()

class LaborRecord(models.Model):
    actual_hours = models.DecimalField(max_digits=5, decimal_places=2)
    billable_hours = models.DecimalField(max_digits=5, decimal_places=2)
    object_id = models.PositiveIntegerField()
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    content_object = GenericForeignKey("content_type", "object_id")


class WorkOrder(models.Model):
    class Status(models.TextChoices):
        IN_PROGRESS = 'st_in_progress', _('In Progress')
        NEW = 'st_new', _('New')
        OPEN = 'st_open', _('Open')
        CANCELLED = 'st_cancelled', _('Cancelled')
        COMPLETED = 'st_completed', _('Completed')
        REJECTED = 'st_rejected', _('Rejected')

    project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name="project_work_orders", null=True)
    name = models.CharField(max_length=100)
    description = models.TextField()
    due_date = models.DateTimeField()
    estimated_labor = models.DecimalField(max_digits=5, decimal_places=2, null=True)
    labor_records = GenericRelation('LaborRecord')
    status = models.CharField(max_length=20, choices=Status.choices, default=Status.NEW)

ORM Operation:

model_contentype_id = ContentType.objects.get_for_model(WorkOrder).id
labor_hour_sq = LaborRecord.objects.filter(object_id=OuterRef("pk"), content_type_id=model_contentype_id).values("object_id")
billable_hours_sq = labor_hour_sq.annotate(billable_labor_hours=Sum("billable_hours")).values("billable_labor_hours")
actual_hours_sq = labor_hour_sq.annotate(actual_labor_hours=Sum("actual_hours")).values("actual_labor_hours")

queryset = Project.objects.first().project_work_orders.all()

filter_condition = Q(
    ~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), 
    Q(due_date__isnull=True) | Q(due_date__date__range=(F('project__start_date'), F('project__end_date')))
)

query = queryset.annotate(
    billable_labor_hours=Subquery(billable_hours_sq),
    actual_labor_hours=Subquery(actual_hours_sq),
).aggregate(
    out_of_bound_count=Count("id", filter=filter_condition), 
    planned_hours=Sum("estimated_labor", filter=~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0), 
    completed_hours=Sum("actual_labor_hours", filter=Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0)
)

When running this operation, the following error occurs:

django.db.utils.ProgrammingError: missing FROM-clause entry for table "app_1_project"
LINE 1: ...LL OR ("__col3" AT TIME ZONE 'UTC')::date BETWEEN "app_1_pro...

The SQL generated by this operation is:

SELECT COUNT("__col1") FILTER (
    WHERE (NOT ("__col2" IN ('completed', 'closed', 'sch_closed', 'cancelled', 'rejected')) 
    AND ("__col3" IS NULL OR ("__col3" AT TIME ZONE 'UTC')::date BETWEEN "app_1_project"."start_date" AND "app_1_project"."end_date"))
), 
COALESCE(SUM("__col4") FILTER (WHERE NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected'))), 0),
COALESCE(SUM("actual_labor_hours") FILTER (WHERE "__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')), 0)
FROM (
    SELECT (
        SELECT SUM(U0."billable_hours") AS "billable_labor_hours"
        FROM "app_1_laborrecord" U0
        WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
        GROUP BY U0."object_id"
    ) AS "billable_labor_hours",
    (
        SELECT SUM(U0."actual_hours") AS "actual_labor_hours"
        FROM "app_1_laborrecord" U0
        WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
        GROUP BY U0."object_id"
    ) AS "actual_labor_hours",
    "app_1_workorder"."id" AS "__col1",
    "app_1_workorder"."status" AS "__col2",
    "app_1_workorder"."due_date" AS "__col3",
    "app_1_workorder"."estimated_labor" AS "__col4"
    FROM "app_1_workorder"
    INNER JOIN "app_1_project"
    ON ("app_1_workorder"."project_id" = "app_1_project"."id")
    WHERE "app_1_workorder"."project_id" = 1
) subquery

Initially, it seemed like a problem with alias field generation for annotated fields within the __range filter.

Explicitly using __gte and __lte instead of using __range resolved the issue. Below is the example for it and the SQL generated by the ORM operation

filter_condition = Q(~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']),Q(due_date__isnull=True) | Q(due_date__date__gte=F('project_start_date'), due_date__date__lte=F('project_end_date')))
SELECT COUNT("__col1") FILTER (WHERE (NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')) AND ("__col3" IS NULL OR (("__col3" AT TIME ZONE 'UTC')::date >= ("__col4") AND ("__col3" AT TIME ZONE 'UTC')::date <= ("__col5"))))),
       COALESCE(SUM("__col6") FILTER (WHERE NOT ("__col2" IN ('st_completed', 'st_cancelled', 'st_rejected'))), 0),
       COALESCE(SUM("actual_labor_hours") FILTER (WHERE "__col2" IN ('st_completed', 'st_cancelled', 'st_rejected')), 0)
  FROM (
        SELECT (
                SELECT SUM(U0."billable_hours") AS "billable_labor_hours"
                  FROM "app_1_laborrecord" U0
                 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
                 GROUP BY U0."object_id"
               ) AS "billable_labor_hours",
               (
                SELECT SUM(U0."actual_hours") AS "actual_labor_hours"
                  FROM "app_1_laborrecord" U0
                 WHERE (U0."content_type_id" = 8 AND U0."object_id" = ("app_1_workorder"."id"))
                 GROUP BY U0."object_id"
               ) AS "actual_labor_hours",
               "app_1_project"."start_date" AS "project_start_date",
               "app_1_project"."end_date" AS "project_end_date",
               "app_1_workorder"."id" AS "__col1",
               "app_1_workorder"."status" AS "__col2",
               "app_1_workorder"."due_date" AS "__col3",
               "app_1_project"."start_date" AS "__col4",
               "app_1_project"."end_date" AS "__col5",
               "app_1_workorder"."estimated_labor" AS "__col6"
          FROM "app_1_workorder"
          LEFT OUTER JOIN "app_1_project"
            ON ("app_1_workorder"."project_id" = "app_1_project"."id")
         WHERE "app_1_workorder"."project_id" = 1
       ) subquery

as you can see it created alias columns __col4 for project__start_date and __col5 for project__end_date. but for the __range it was directly trying to fetch from the table/model.

I tried annotating the project start and end dates as follows with __range filter and it didn't help either. Despite explicitly annotating the fields, the generated SQL remains unchanged from the original. Django does not recognize or utilize the aliased/annotated fields:

filter_condition = Q(~Q(status__in=['completed', 'closed', 'sch_closed', 'cancelled', 'rejected']), Q(due_date__isnull=True) | Q(due_date__date__range=(F('project_start_date'), F('project_end_date'))))
queryset.annotate(
    billable_labor_hours=Subquery(billable_hours_sq),
    actual_labor_hours=Subquery(actual_hours_sq),
    project_start_date=F("project__start_date"),
    project_end_date=F("project__end_date")
).aggregate(
    out_of_bound_count=Count("id", filter=filter_condition), 
    planned_hours=Sum("estimated_labor", filter=~Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0), 
    completed_hours=Sum("actual_labor_hours", filter=Q(status__in=['st_completed', 'st_cancelled', 'st_rejected']), default=0)
)

The issue seems related to ongoing discussions in Django's ticket #33929, but further investigation is needed to confirm a direct link.

Change History (13)

comment:1 by Aashay Amballi, 12 months ago

Summary: Django ORM `__range` Filter Fails to Use Annotated Date Fields in SQL GenerationDjango ORM `__range` Filter Fails to Use Annotated Related Fields in SQL Generation

comment:2 by Sarah Boyce, 12 months ago

Triage Stage: UnreviewedAccepted

I think I can replicate, on SQLite I get an error: django.db.utils.OperationalError: no such column: appname_project.start_date

I think I can simplify the example a bit

class Project(models.Model):
    start_date = models.DateField()
    end_date = models.DateField()


class WorkOrder(models.Model):
    project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name="project_work_orders", null=True)
    due_date = models.DateTimeField()

Then I get the same error with:

WorkOrder.objects.annotate(
    project_id_subquery=Subquery(Project.objects.filter(id=OuterRef("project_id")).values("id")),
).aggregate(
    out_of_bound_count=Count("id", filter=Q(due_date__date__range=(F('project__start_date'), F('project__end_date')))),
    project_id_subquery_sum=Sum("project_id_subquery"),
)

comment:3 by Sarah Boyce, 12 months ago

Summary: Django ORM `__range` Filter Fails to Use Annotated Related Fields in SQL Generation__range lookup in conditional aggregate with subquery annotation does not use annotated related fields

comment:4 by Simon Charette, 12 months ago

Cc: Simon Charette added

An important lead here is likely this statement

Explicitly using __gte and __lte instead of using __range resolved the issue.

If that's truly the case then it means that the likely culprit is how RangeLookup stores its resolved right-hand-side as a tuple(Col(...), Col(...)) which trips resolvable detection (hasattr(expr, "resolve_expression")) that happens when alias relabeling takes place (see Expression.relabeled_clone).

Expression.relabeled_clone, which is called when subqueries are involved and tables need to be re-aliases to avoid conflicts, relies on self.get_source_expression to walk through the expression tree and make sure all nested references to tables are re-aliases. If you look at Lookup.get_source_expressions though you'll notice that the right-hand-side is excluded if it's not a compilable (this should likely be a resolvable check instead). Since RangeLookup.rhs is a tuple and thus not considered a resolvable it falls on its face and its members are not considered for re-labeling.

We have a larger problem here where lookups that allow containers of potentially resolvable members as right-hand-side (e.g. __range, __in) that are not considered as resolvable themselves that we should address but the immediate solution for RangeLookup appears to be wrapping it's right-hand-side in an ExpressionList at resolving time and adjusting its as_sql method accordingly.

Last edited 12 months ago by Simon Charette (previous) (diff)

comment:5 by Simon Charette, 12 months ago

Some related tickets of interest here are #22288 and #16487 which added support for tuple[Resolvable, Resolvable] but missed the re-aliasing support.

comment:6 by Gregory Mariani, 12 months ago

Can't reproduce with the minimal example. I get:
{'out_of_bound_count': 0, 'project_id_subquery_sum': 3}
with that script:

project1 = Project.objects.create(
    start_date='2024-01-01',
    end_date='2024-06-30')

project2 = Project.objects.create(
    start_date='2024-07-01',
    end_date='2024-12-31')

due_date1 = timezone.make_aware(timezone.datetime(2024, 4, 15, 10, 0, 0))
due_date2 = timezone.make_aware(timezone.datetime(2024, 10, 1, 12, 0, 0))

work_order1 = WorkOrder.objects.create(
    project=project1,
    due_date=due_date1,
)

work_order2 = WorkOrder.objects.create(
    project=project2,
    due_date=due_date1,
)

result = WorkOrder.objects.annotate(
    project_id_subquery=Subquery(Project.objects.filter(
        id=OuterRef("project_id")).values("id")),
).aggregate(
    out_of_bound_count=Count("id", filter=Q(due_date__date__range=(
        F('project__start_date'), F('project__end_date')))),
    project_id_subquery_sum=Sum("project_id_subquery"),
)
print(result)

comment:7 by Simon Charette, 12 months ago

Has patch: set
Owner: set to Simon Charette
Status: newassigned

comment:8 by Sarah Boyce, 10 months ago

Triage Stage: AcceptedReady for checkin

comment:9 by Sarah Boyce <42296566+sarahboyce@…>, 10 months ago

Resolution: fixed
Status: assignedclosed

In 089deb82:

Fixed #36025 -- Fixed re-aliasing of iterable (in/range) lookups rhs.

In order for Expression.relabeled_clone to work appropriately its
get_source_expressions method must return all resolvable which wasn't the case
for Lookup when its right-hand-side is "direct" (not a compilable).

While refs #22288 added support for non-literals iterable right-hand-side
lookups it predated the subclassing of Lookup(Expression) refs #27021 which
could have been an opportunity to ensure right-hand-sides are always resolvable
(ValueList and ExpressionList).

Addressing all edge case with non-resolvable right-hand-sides would require
a significant refactor and deprecation of some parts of the Lookup interface so
this patch only focuses on FieldGetDbPrepValueIterableMixin (In and Range
lookups) by making sure that a right-hand-side containing resolvables are dealt
with appropriately during the resolving phase.

Thanks Aashay Amballi for the report.

comment:10 by Sarah Boyce <42296566+sarahboyce@…>, 10 months ago

In 0bac41f:

Refs #34975 -- Removed unnecessary lookups.In.get_refs().

Now that In.get_source_expression() includes its right-hand-side when it
contains expressions (refs #36025) it no longer requires a specialized
get_refs() method.

comment:11 by Sarah Boyce <42296566+sarahboyce@…>, 10 months ago

In d99985b:

[5.2.x] Fixed #36025 -- Fixed re-aliasing of iterable (in/range) lookups rhs.

In order for Expression.relabeled_clone to work appropriately its
get_source_expressions method must return all resolvable which wasn't the case
for Lookup when its right-hand-side is "direct" (not a compilable).

While refs #22288 added support for non-literals iterable right-hand-side
lookups it predated the subclassing of Lookup(Expression) refs #27021 which
could have been an opportunity to ensure right-hand-sides are always resolvable
(ValueList and ExpressionList).

Addressing all edge case with non-resolvable right-hand-sides would require
a significant refactor and deprecation of some parts of the Lookup interface so
this patch only focuses on FieldGetDbPrepValueIterableMixin (In and Range
lookups) by making sure that a right-hand-side containing resolvables are dealt
with appropriately during the resolving phase.

Thanks Aashay Amballi for the report.

Backport of 089deb82b9ac2d002af36fd36f288368cdac4b53 from main.

comment:12 by Sarah Boyce <42296566+sarahboyce@…>, 10 months ago

In 0690c060:

[5.2.x] Refs #34975 -- Removed unnecessary lookups.In.get_refs().

Now that In.get_source_expression() includes its right-hand-side when it
contains expressions (refs #36025) it no longer requires a specialized
get_refs() method.

Backport of 0bac41fc7e4a842e8d20319cba31cc645501c245 from main.

comment:13 by Jacob Walls <jacobtylerwalls@…>, 2 days ago

In 7b54ddd:

Refs #36025 -- Made get_prep_lookup() pass output_field when wrapping direct values in Value.

Previously, only strings were supplied with an output_field when wrapping
direct value iterable elements in Value expressions for ExpressionList. This
caused problems for in lookups on JSONField when using expressions
alongside direct values, as JSONField values can have different types which
need to be adapted by the field's get_db_prep_value().

Refs #36689.

Thanks Jacob Walls for the review.

Note: See TracTickets for help on using tickets.
Back to Top