Opened 3 years ago

Closed 3 years ago

#32801 closed New feature (duplicate)

Support for non-aggregation subqueries in FROM clause

Reported by: Jameel A. Owned by: nobody
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Jameel A.)

This relates to #24462 and is perhaps a duplicate, but the underlying issue is a bit different.

When using JSONB columns in Postgres, it is sometimes useful to query JSON array data as if they're rows. This can be accomplished with the jsob_array_elements function. The problem with this function is that the resulting row cannot be filtered directly and must first be wrapped in a subquery.

In particular, I would like to be able to generate a query such as:

select foo_w_elems.id
from (
	select id, jsonb_array_elements(json_data->'some_array') as elem
	from foo as foo1
	union
	select id, jsonb_array_elements(json_data->'other_array') as elem
	from foo as foo2
) as foo_w_elems
where (elem->>'bar_id')::int in (
	select id
	from bar
	where expires_at >= CURRENT_TIMESTAMP
)

Even without the union, I could not find a way to generate such a query with Django today.

It appears that Django doesn't support querying FROM a subquery. While AggregateQuery gets us part of the way there, it doesn't support filtering or any other operation applied after a FROM clause.

I have managed to hack together support with something similar to:

class SubqueryTable:
    join_type = None
    parent_alias = None

    def __init__(self, query, alias):
        self.subquery = query
        self.subquery_alias = alias

    def as_sql(self, compiler, connection):
        alias_str = f' {self.subquery_alias}'
        base_sql, params = self.subquery.as_sql(compiler, connection)
        return base_sql + alias_str, params

    def relabeled_clone(self, change_map):
        return self.__class__(
            self.subquery,
            change_map.get(self.subquery_alias, self.subquery_alias),
        )


class JsonbArrayElements(Func):
    function = 'jsonb_array_elements'


class FooQuerySet(models.QuerySet):
    def has_expired_bar(self):
        obj = self._chain()

        # Expand each leg into a separate row
        foo_with_some_array = self.annotate(
            elem=JsonbArrayElements(KeyTransform('some_array', 'json_data'))
        )
        foo_with_other_array = self.annotate(
            elem=JsonbArrayElements(KeyTransform('other_array', 'json_data'))
        )
        foo_with_array_elems = foo_with_some_array.union(foo_with_other_array)

        # Convert the query into a subquery
        subquery = foo_with_array_elems.query
        subquery.subquery = True

        # Exclude foo with active bars
        # The use of "Ref" is a bit of a hack and an abuse of the API.
        # Unfortunately, I couldn't find a better way to reference
        # fields defined in the subquery.
        obj = obj.annotate(
            elem_bar_id=Cast(
                KeyTextTransform('bar_id', Ref('elem', subquery)),
                models.IntegerField()
            ),
        )
        bars = Bar.objects.active()
        obj = obj.exclude(elem_bat_id__in=bars.values('pk'))

        # This is a hack to force Django to relabel a field's table name
        # to match the FROM clause table alias
        relabels = {t: 'subquery' for t in subquery.alias_map}
        relabels[None] = 'subquery'
        obj.query.change_aliases(relabels)

        # Override the FROM clause by aliasing to the subquery
        obj.query.alias_map['subquery'] = SubqueryTable(subquery, 'subquery')

        # Remove the annotation from the select clause to avoid
        # introducing a hidden extra column
        annotation_mask = obj.query.annotation_select_mask
        annotation_mask = annotation_mask.remove('elem_bar_id') if annotation_mask else []
        obj.query.set_annotation_mask(annotation_mask)

        return obj.distinct()

While implementing this workaround, I found a few issues:

  1. I was unable to find a good way to have the query layer automatically resolve a reference to an annotation defined in a subquery without redefining it. While aggregated queries technically seem to have support for this, it appears they "lift" the annotation into the outer query instead of referring to the annotation within the subquery. This distinction is important for jsonb_array_elements.
  2. Because of (1), I had to annotate the outer query and then find a way to trick the query later to use a field name that would resolve when executed against the DB.
  3. While the query layer seems to handle renaming aliases in subqueries for aggregate queries, I had to manually change the outer query table alias to properly refer to the subquery fields.

I realize that full support for subqueries is likely far more challenging than my workaround, but support would be appreciated as filtering subqueries in an outer query is sometimes necessary to make use of more advanced DB features.

Change History (3)

comment:1 by Jameel A., 3 years ago

Description: modified (diff)

comment:2 by Jameel A., 3 years ago

Description: modified (diff)

comment:3 by Mariusz Felisiak, 3 years ago

Component: UncategorizedDatabase layer (models, ORM)
Resolution: duplicate
Status: newclosed

Thanks for the detailed report. I agree that this is a duplicate of #24462, that could also be solved by #28919 (CTE).

Note: See TracTickets for help on using tickets.
Back to Top