Opened 5 years ago

Last modified 2 months ago

#28333 assigned New feature

Filter and subquery for window expressions

Reported by: Mads Jensen Owned by: Simon Charette
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: window orm filter subquery GSoC
Cc: Alexandr Artemyev, Andy Terra, Étienne Beaulé, Michael Wheeler, şuayip üzülmez, John Speno, Alex Scott, Ad Timmering, Hannes Ljungberg, Dave Johansen, Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Mads Jensen)

#26608 will introduce window function expressions, but will disallow filtering on the result of them, e.g.:

Window.objects.annotate(row=Window(expression=RowNumber())).filter(row__gt=1)

is not allowed. Instead, the window function expression should be wrapped in an inner query, and the filtering should be done in an outer query.

Change History (32)

comment:1 Changed 5 years ago by Mads Jensen

Description: modified (diff)
Triage Stage: UnreviewedAccepted

comment:2 Changed 3 years ago by Bernd Wechner

This is 2 years old with no action and I am very keen to see it implemented (need it rather badly).

It strikes me as an aside that a more general approach may kill more birds with one stone. I noticed the rather excellent ExpressionWrapper(), and it struck me that a QueryWrapper() would be a more general solution that covers this particular need and will cover others as well, known and unknown at present.

In short QueryWrapper would simply make an inner query of the QuerySet to date so that subsequent operations act upon it as if it were a table.

Last edited 3 years ago by Bernd Wechner (previous) (diff)

comment:3 Changed 3 years ago by Alexandr Artemyev

Cc: Alexandr Artemyev added

comment:4 Changed 3 years ago by Andy Terra

Cc: Andy Terra added

comment:5 Changed 2 years ago by Étienne Beaulé

Cc: Étienne Beaulé added

comment:6 Changed 2 years ago by Mariusz Felisiak

Keywords: GSoC added

comment:7 Changed 2 years ago by Manav Agarwal

Owner: changed from nobody to Manav Agarwal
Status: newassigned

comment:9 Changed 23 months ago by Manav Agarwal

I was doing some research on this issue and found a few solutions to the problem. (all these are vague ideas. Any suggestions/feedback would be appreciated to make the idea worth implementing)

  1. A separate QueryWrapper Class which will have syntax like this
    1. Window.objects.annotate(row=QueryWrapper(Window(expression=RowNumber())).filter(row__gt=1) OR
    2. QueryWrapper class internally implemented for window function to automatically generate SQL subquery for all window expressions.
  2. Use the subquery class internally to make it execute all window expression related queries as subqueries.
  3. Passing an alias to window expression and then in spite of generating half query with just over and order by clause we may generate a separate select statement when will further be used as a select statement for a separate table in the query.

I personally feel that implementing 1.a would be a good option but as I mentioned above this is just a vague idea and to implement it I need some guidance from someone who is more experienced.

Last edited 23 months ago by Manav Agarwal (previous) (diff)

comment:10 Changed 22 months ago by Michael Wheeler

Cc: Michael Wheeler added

comment:11 Changed 20 months ago by Manav Agarwal

Owner: Manav Agarwal deleted

comment:12 Changed 20 months ago by şuayip üzülmez

Cc: şuayip üzülmez added

comment:13 Changed 18 months ago by John Speno

Cc: John Speno added

comment:14 Changed 17 months ago by Alex Scott

Cc: Alex Scott added

Is there a recommend workaround for not being able to filter on a window function result? Can you wrap it somehow manually?

comment:15 Changed 17 months ago by Mariusz Felisiak

Status: assignednew

comment:16 Changed 16 months ago by Ad Timmering

Cc: Ad Timmering added

comment:17 Changed 14 months ago by rossm6

I have this problem as well.

According to this article https://learnsql.com/blog/window-functions-not-allowed-in-where/ the solution is either to use a common table expression or a subquery which is the FROM clause in the sql query. Neither unfortunately is supported by django it seems. Although I did find this package for the first option - https://docs.djangoproject.com/en/3.2/ref/models/querysets/#extra.

Both of these should be options in the Django ORM right? Each would be a big win for the power of the ORM.

comment:18 Changed 13 months ago by Hannes Ljungberg

Cc: Hannes Ljungberg added

comment:19 Changed 12 months ago by Dave Johansen

Cc: Dave Johansen added

comment:20 Changed 11 months ago by Safa Alfulaij

What I'm doing currently is this hack:

Before:

queryset = MyModel.objects.annotate(row=Window(expression=RowNumber())).filter(row__gt=1)

After:

queryset = MyModel.objects.annotate(row=Window(expression=RowNumber()))
sql, params = queryset.query.sql_with_params()
queryset = queryset.raw(f"SELECT * FROM ({sql}) AS full WHERE row >= 1", params)

I don't see that it's bad to have this currently, with whatever limitations of raw documented in the Window filtering section, then add in more features if real use cases are provided.

comment:21 Changed 4 months ago by Simon Charette

#26780 which is about adding support for slices prefetching (think top-n results per category) to core would benefit from this feature being implemented at least partially.

The most difficult part of this issue is not the subquery pushdown itself (see #24462) but making sure that union filters of the form filter(Q(window__lookup=foo) | Q(aggregate__lookup=bar) | Q(field__lookup=baz)) are resulting in the proper usage of inner query WHERE and HAVING and outer query usage of WHERE (see the Where.split_having method for the current implementation).

If we were to start by focusing this ticket on the simple intersection use cases of the form filter(window__lookup=foo) (as reported here and required by #26780) I suspect we'd cover most of the use cases while deferring most of the complexity. If someone would like to give this a shot I'd start by doing the following:

  1. Make Window.filterable = True for now
  2. Adjust Where.split_having to properly deal with self.contains_over_clause by returning a triple of the form (where: Where, having: Where, window: Where) and error out when self.connector != AND and self.contains_over_clause. Possibly rename to split_having_window?
  3. Adjust SQLCompiler.pre_sql_setup to assign self.over_where and use it in SQLCompiler.as_sql to wrap the query in a subquery that SELECT * FROM ({subquery_sql}) subquery WHERE {over_where_sql}
  4. Add tests for new supported use cases and disallowed ones.
  5. Make Q.filterable return False when self.connector != AND and self.contains_over_clause but that will result in weird error messages of the form Q is disallowed in the filter clause. so maybe we'll want to deprecate Q.filterable in favour of a BaseExpression.check_filterable method instead that defaults to raise the current message and is overridden in Q to raise a proper message with regards to complex filters window functions.

Happy to review a PR that attempts the above or provide feedback here if that means this ticket is partially fixed and allows for #26780 to benefit from this work.

comment:22 Changed 4 months ago by Simon Charette

Cc: Simon Charette added

comment:23 Changed 4 months ago by Simon Charette

Had a first stab at the above and it seems to be working relatively well, not too intrusive of a change. I'll give a shot at implementing #26780 on to of it now to confirm it could work.

As a side note it seems that the Snowflake database has an SQL extension to filter against window functions, the QUALIFY clause.

comment:24 Changed 4 months ago by Simon Charette

Submit a PR that adds support for jointed predicates but still disallowed disjointed ones.

For example, given the following model and queryset

class Employee(models.Model):
    name = models.CharField(max_length=50)
    department = models.CharField(max_length=50)
    salary = models.IntegerField()

class PastEmployeeDepartment(models.Model):
    employee = models.ForeignKey(Employee, related_name="past_departments")
    department = models.CharField(max_length=50)

queryset = Employee.objects.annotate(
    dept_max_salary=Window(Max(), partition_by="department"),
    dept_salary_rank=Window(Rank(), partition_by="department", order_by="-salary"),
    past_depths_cnt=Count("past_departments"),
)

All of the following is supported

# window predicate will be pushed to outer query
queryset.filter(dept_max_salary__gte=F("salary")) 
SELECT * FROM (...) "quantify" WHERE dept_max_salary >= "quantify"."salary"

# department predicate will be applied in inner query
queryset.filter(department="IT", dept_max_salary__gte=F("salary"))
SELECT * FROM (... WHERE "department" = 'IT') "quantify" WHERE dept_max_salary >= "quantify"."salary"

# aggregate predicate will be applied in the inner query
queryset.filter(past_depths_cnt__gte=1, dept_max_salary__gte=F("salary"))
SELECT * FROM (... HAVING COUNT("pastemployeedepartment"."id" >= 1) "quantify" WHERE dept_max_salary >= "quantify"."salary"

Some form of disjointed predicates against window functions (using OR) are also supported as long as they are only against window functions

# Disjointed predicates only about window functions is supported
queryset.filter(Q(dept_max_salary__gte=F("salary")) | Q(dept_salary_rank__lte=2))
SELECT * FROM (...) "quantify" WHERE "dept_max_salary" >= "quantify"."salary" OR "dept_salary_rank" <= 2

And limits are only applied on the outer query, once all window function filters are applied.

The following is not supported

  1. Disjointed filters mixing predicates against window functions and aggregates and/or column references as it's really hard to emulate without getting in multiple level of subquery pushdown particularly if aggregation is involved.
  2. Filtering against columns masked by the usage of values, values_list. This one could be to solved by adding another layer of subquery pushdown that avoids applying the mask in the subquery but does so in an outermost query over the one used for window filtering.
  3. Passing window functions instances directly to filter and exclude instead of referencing annotated window functions.

Feedback about the proposed supported feature set and implementation is very welcome.

Last edited 4 months ago by Simon Charette (previous) (diff)

comment:25 Changed 4 months ago by Simon Charette

Has patch: set

comment:26 Changed 4 months ago by Mariusz Felisiak

Needs tests: set
Owner: set to Simon Charette
Patch needs improvement: set
Status: newassigned

comment:27 Changed 4 months ago by Simon Charette

The latest version of the patch now supports filtering against annotations masked by the usage of values and friends.

queryset.filter(dept_max_salary__gte=2000).values("id") now results in

SELECT "col1" FROM (
    SELECT * FROM (
        SELECT "id" AS "col1", MAX OVER (...) AS "depth_max_salary" FROM ...
    ) "qualify" WHERE "dept_max_salary" >= 2000
) "qualify_mask"

comment:28 Changed 4 months ago by Mariusz Felisiak <felisiak.mariusz@…>

In 35911078:

Replaced Expression.replace_references() with .replace_expressions().

The latter allows for more generic use cases beyond the currently
limited ones constraints validation has.

Refs #28333, #30581.

comment:29 Changed 4 months ago by Mariusz Felisiak <felisiak.mariusz@…>

In 8c3046da:

Refs #28333 -- Moved SQLCompiler's forced column aliasing logic to get_select().

This extends query composability possibilities when dealing with
subqueries which is necessary to implement window function filtering.

comment:30 Changed 4 months ago by Mariusz Felisiak <felisiak.mariusz@…>

In f387d024:

Refs #28333 -- Added partial support for filtering against window functions.

Adds support for joint predicates against window annotations through
subquery wrapping while maintaining errors for disjointed filter
attempts.

The "qualify" wording was used to refer to predicates against window
annotations as it's the name of a specialized Snowflake extension to
SQL that is to window functions what HAVING is to aggregates.

While not complete the implementation should cover most of the common
use cases for filtering against window functions without requiring
the complex subquery pushdown and predicate re-aliasing machinery to
deal with disjointed predicates against columns, aggregates, and window
functions.

A complete disjointed filtering implementation should likely be
deferred until proper QUALIFY support lands or the ORM gains a proper
subquery pushdown interface.

comment:31 Changed 4 months ago by Mariusz Felisiak

Has patch: unset
Needs tests: unset
Patch needs improvement: unset

comment:32 Changed 3 months ago by GitHub <noreply@…>

In f210de76:

Refs #28333 -- Fixed NonQueryWindowTests.test_invalid_filter() on databases that don't support window expressions.

comment:33 Changed 2 months ago by GitHub <noreply@…>

In 3ba7f2e:

Refs #28333 -- Explicitly ordered outer qualify query on window filtering.

While most backends will propagate derived table ordering as long as
the outer query doesn't perform additional processing the SQL specs
doesn't explicitly state the ordering must be maintained.

Note: See TracTickets for help on using tickets.
Back to Top