Opened 7 years ago
Last modified 17 months ago
#28333 assigned New feature
Filter and subquery for window expressions
Reported by: | Mads Jensen | Owned by: | Simon Charette |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | window orm filter subquery GSoC |
Cc: | Alexandr Artemyev, Andy Terra, Étienne Beaulé, Michael Wheeler, şuayip üzülmez, John Speno, Alex Scott, Ad Timmering, Hannes Ljungberg, Dave Johansen, Simon Charette | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
#26608 will introduce window function expressions, but will disallow filtering on the result of them, e.g.:
Window.objects.annotate(row=Window(expression=RowNumber())).filter(row__gt=1)
is not allowed. Instead, the window function expression should be wrapped in an inner query, and the filtering should be done in an outer query.
Change History (34)
comment:1 by , 7 years ago
Description: | modified (diff) |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 5 years ago
Cc: | added |
---|
comment:4 by , 5 years ago
Cc: | added |
---|
comment:5 by , 4 years ago
Cc: | added |
---|
comment:6 by , 4 years ago
Keywords: | GSoC added |
---|
comment:7 by , 4 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:9 by , 4 years ago
I was doing some research on this issue and found a few solutions to the problem. (all these are vague ideas. Any suggestions/feedback would be appreciated to make the idea worth implementing)
- A separate QueryWrapper Class which will have syntax like this
Window.objects.annotate(row=QueryWrapper(Window(expression=RowNumber())).filter(row__gt=1)
OR- QueryWrapper class internally implemented for window function to automatically generate SQL subquery for all window expressions.
- Use the subquery class internally to make execute all window expression related queries as subqueries.
- Passing an alias to window expression and then in spite of generating half query with just over and order by clause we may generate a separate select statement when will further be used as a select statement for a separate table in the query.
I personally feel that implementing 1.a would be a good option but as I mentioned above this is just a vague idea and to implement it I need some guidance from someone who is more experienced.
comment:10 by , 4 years ago
Cc: | added |
---|
comment:11 by , 4 years ago
Owner: | removed |
---|
comment:12 by , 4 years ago
Cc: | added |
---|
comment:13 by , 3 years ago
Cc: | added |
---|
comment:14 by , 3 years ago
Cc: | added |
---|
Is there a recommend workaround for not being able to filter on a window function result? Can you wrap it somehow manually?
comment:15 by , 3 years ago
Status: | assigned → new |
---|
comment:16 by , 3 years ago
Cc: | added |
---|
comment:17 by , 3 years ago
I have this problem as well.
According to this article https://learnsql.com/blog/window-functions-not-allowed-in-where/ the solution is either to use a common table expression or a subquery which is the FROM clause in the sql query. Neither unfortunately is supported by django it seems. Although I did find this package for the first option - https://docs.djangoproject.com/en/3.2/ref/models/querysets/#extra.
Both of these should be options in the Django ORM right? Each would be a big win for the power of the ORM.
comment:18 by , 3 years ago
Cc: | added |
---|
comment:19 by , 3 years ago
Cc: | added |
---|
comment:20 by , 3 years ago
What I'm doing currently is this hack:
Before:
queryset = MyModel.objects.annotate(row=Window(expression=RowNumber())).filter(row__gt=1)
After:
queryset = MyModel.objects.annotate(row=Window(expression=RowNumber())) sql, params = queryset.query.sql_with_params() queryset = queryset.raw(f"SELECT * FROM ({sql}) AS full WHERE row >= 1", params)
I don't see that it's bad to have this currently, with whatever limitations of raw
documented in the Window
filtering section, then add in more features if real use cases are provided.
comment:21 by , 2 years ago
#26780 which is about adding support for slices prefetching (think top-n results per category) to core would benefit from this feature being implemented at least partially.
The most difficult part of this issue is not the subquery pushdown itself (see #24462) but making sure that union filters of the form filter(Q(window__lookup=foo) | Q(aggregate__lookup=bar) | Q(field__lookup=baz))
are resulting in the proper usage of inner query WHERE
and HAVING
and outer query usage of WHERE
(see the Where.split_having
method for the current implementation).
If we were to start by focusing this ticket on the simple intersection use cases of the form filter(window__lookup=foo)
(as reported here and required by #26780) I suspect we'd cover most of the use cases while deferring most of the complexity. If someone would like to give this a shot I'd start by doing the following:
- Make
Window.filterable = True
for now - Adjust
Where.split_having
to properly deal withself.contains_over_clause
by returning a triple of the form(where: Where, having: Where, window: Where)
and error out whenself.connector != AND and self.contains_over_clause
. Possibly rename tosplit_having_window
? - Adjust
SQLCompiler.pre_sql_setup
to assignself.over_where
and use it inSQLCompiler.as_sql
to wrap the query in a subquery thatSELECT * FROM ({subquery_sql}) subquery WHERE {over_where_sql}
- Add tests for new supported use cases and disallowed ones.
- Make
Q.filterable
returnFalse
whenself.connector != AND and self.contains_over_clause
but that will result in weird error messages of the formQ is disallowed in the filter clause.
so maybe we'll want to deprecateQ.filterable
in favour of aBaseExpression.check_filterable
method instead that defaults toraise
the current message and is overridden inQ
to raise a proper message with regards to complex filters window functions.
Happy to review a PR that attempts the above or provide feedback here if that means this ticket is partially fixed and allows for #26780 to benefit from this work.
comment:22 by , 2 years ago
Cc: | added |
---|
comment:23 by , 2 years ago
Had a first stab at the above and it seems to be working relatively well, not too intrusive of a change. I'll give a shot at implementing #26780 on to of it now to confirm it could work.
As a side note it seems that the Snowflake database has an SQL extension to filter against window functions, the QUALIFY clause.
comment:24 by , 2 years ago
Submit a PR that adds support for jointed predicates but still disallowed disjointed ones.
For example, given the following model and queryset
class Employee(models.Model): name = models.CharField(max_length=50) department = models.CharField(max_length=50) salary = models.IntegerField() class PastEmployeeDepartment(models.Model): employee = models.ForeignKey(Employee, related_name="past_departments") department = models.CharField(max_length=50) queryset = Employee.objects.annotate( dept_max_salary=Window(Max(), partition_by="department"), dept_salary_rank=Window(Rank(), partition_by="department", order_by="-salary"), past_depths_cnt=Count("past_departments"), )
All of the following is supported
# window predicate will be pushed to outer query queryset.filter(dept_max_salary__gte=F("salary")) SELECT * FROM (...) "quantify" WHERE dept_max_salary >= "quantify"."salary" # department predicate will be applied in inner query queryset.filter(department="IT", dept_max_salary__gte=F("salary")) SELECT * FROM (... WHERE "department" = 'IT') "quantify" WHERE dept_max_salary >= "quantify"."salary" # aggregate predicate will be applied in the inner query queryset.filter(past_depths_cnt__gte=1, dept_max_salary__gte=F("salary")) SELECT * FROM (... HAVING COUNT("pastemployeedepartment"."id" >= 1) "quantify" WHERE dept_max_salary >= "quantify"."salary"
Some form of disjointed predicates against window functions (using OR
) are also supported as long as they are only against window functions
# Disjointed predicates only about window functions is supported queryset.filter(Q(dept_max_salary__gte=F("salary")) | Q(dept_salary_rank__lte=2)) SELECT * FROM (...) "quantify" WHERE "dept_max_salary" >= "quantify"."salary" OR "dept_salary_rank" <= 2
And limits are only applied on the outer query, once all window function filters are applied.
The following is not supported
- Disjointed filters mixing predicates against window functions and aggregates and/or column references as it's really hard to emulate without getting in multiple level of subquery pushdown particularly if aggregation is involved.
- Filtering against columns masked by the usage of
values
,values_list
. This one could be to solved by adding another layer of subquery pushdown that avoids applying the mask in the subquery but does so in an outermost query over the one used for window filtering. - Passing window functions instances directly to
filter
andexclude
instead of referencing annotated window functions.
Feedback about the proposed supported feature set and implementation is very welcome.
comment:25 by , 2 years ago
Has patch: | set |
---|
comment:26 by , 2 years ago
Needs tests: | set |
---|---|
Owner: | set to |
Patch needs improvement: | set |
Status: | new → assigned |
comment:27 by , 2 years ago
The latest version of the patch now supports filtering against annotations masked by the usage of values
and friends.
queryset.filter(dept_max_salary__gte=2000).values("id")
now results in
SELECT "col1" FROM ( SELECT * FROM ( SELECT "id" AS "col1", MAX OVER (...) AS "depth_max_salary" FROM ... ) "qualify" WHERE "dept_max_salary" >= 2000 ) "qualify_mask"
comment:31 by , 2 years ago
Has patch: | unset |
---|---|
Needs tests: | unset |
Patch needs improvement: | unset |
comment:34 by , 21 months ago
Referencing outer window expressions in subqueries should also be supported, see #34368.
comment:35 by , 17 months ago
My code was working till 3.2, but with https://code.djangoproject.com/changeset/8c3046daade8d9b019928f96e53629b03060fe73, it doesn't anymore.
Here is a simplified (no filter(), less annotate(), fake Value(), ...) demonstrator:
>>> from postman.models import Message >>> qs1=Message.objects.values_list('id').order_by() >>> print(qs1.query) # correct SELECT "postman_message"."id" FROM "postman_message" >>> qs2=Message.objects.values('thread').annotate(id=Value(2, IntegerField())).values_list('id').order_by() >>> print(qs2.query) # correct SELECT 2 AS "id" FROM "postman_message" >>> print(qs1.union(qs2).query) # will cause my problem SELECT "postman_message"."id" AS "col1" FROM "postman_message" UNION SELECT 2 AS "id" FROM "postman_message"
The noticeable point is the introduction of the alias AS "col1"
, not compatible with the id
in the second part.
In the full code, the union is injected in another query, of the form SELECT ... FROM ... INNER JOIN (the union) PM ON (... = PM.id)
So it leads to the error: django.db.utils.OperationalError: no such column: PM.id
In db/models/sql/compiler.py, get_combinator_sql(), the call is imposed as: as_sql(with_col_aliases=True)
I don't know how to solve this problem.
Any advice?
This is 2 years old with no action and I am very keen to see it implemented (need it rather badly).
It strikes me as an aside that a more general approach may kill more birds with one stone. I noticed the rather excellent ExpressionWrapper(), and it struck me that a QueryWrapper() would be a more general solution that covers this particular need and will cover others as well, known and unknown at present.
In short QueryWrapper would simply make an inner query of the QuerySet to date so that subsequent operations act upon it as if it were a table.