#35437 closed Bug (needsinfo)
Exists subquery ignores ordering when it's distinct
| Reported by: | Kevin Marsh | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | subquery, exists, ordering, distinct |
| Cc: | Kevin Marsh, Simon Charette | Triage Stage: | Accepted |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
I ran into a bug in the Exists expression, I'm using it to do some filtering where I just want the latest object grouped by a certain value (eg. latest financial statement object for each company). See the patch for a failing test, but using the Manager/Employee models from the test suite you can see in the slightly contrived example that for an Exists query like
# ... filtering by highest paid employee per manager
Exists(
Employee.objects.order_by("manager", "-salary").distinct("manager").filter(pk=OuterRef("pk"))
)
# ...
Gets transformed into SQL (Postgresql) like this where the ordering has been stripped out
-- ...
EXISTS(
SELECT DISTINCT ON (U0."manager_id")
1 AS "a"
FROM "expressions_employee" U0
WHERE U0."id" = ("expressions_employee"."id")
-- Missing ordering which is required for distinct
LIMIT 1
)
-- ...
Obviously we want to call clear_ordering most of the time on Exists subqueries for performance reasons, but in this case we either shouldn't clear them or loudly raise an exception saying that distinct Exists queries are not supported
Attachments (1)
Change History (3)
by , 18 months ago
| Attachment: | test_ordering_in_distinct_exists_subquery.2.diff added |
|---|
comment:1 by , 18 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|---|
| Type: | Uncategorized → Bug |
The report makes sense. We could go two ways here
- Clear
distinct_fieldsas well as it's unnecessary (we only care about the existence of a single row after all) - Avoid forcing the removal of ordering since the
clear_orderingmethod is a noop when distinct fields are defined.
comment:2 by , 18 months ago
| Cc: | added |
|---|---|
| Resolution: | → needsinfo |
| Status: | new → closed |
Actually something is off in your test. Even if the ordering is not cleared all employees will match because you reduce the number of rows of the subquery by employee id
e.g.
SELECT "expressions_employee"."id", "expressions_employee"."firstname", "expressions_employee"."lastname", "expressions_employee"."salary", "expressions_employee"."manager_id", "expressions_employee"."based_in_eu" FROM "expressions_employee" WHERE (EXISTS (SELECT DISTINCT ON (U0."manager_id") 1 AS "a" FROM "expressions_employee" U0 WHERE (U0."manager_id" IS NOT NULL AND U0."id" = ("expressions_employee"."id")) ORDER BY U0."manager_id" ASC, U0."salary" DESC LIMIT 1) AND "expressions_employee"."manager_id" = 2) LIMIT 21;
In other words, you only filter by U0."id" = ("expressions_employee"."id") so the current employee will always exists in the subquery and DISTINCT and ORDER BY have nothing to do with it.
Try it out for yourself with the following patch that avoids eliding the ordering part
-
django/db/models/sql/query.py
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index b3f130c0b4..07da7ae1a1 100644
a b def exists(self, limit=True): 648 648 combined_query.exists(limit=False) 649 649 for combined_query in q.combined_queries 650 650 ) 651 q.clear_ordering( force=True)651 q.clear_ordering() 652 652 if limit: 653 653 q.set_limits(high=1) 654 654 q.add_annotation(Value(1), "a")
This appears to be a misuse of EXISTS. You want to a subquery I suppose to achieve your purpose.
Employee.objects.filter( id__in=Employee.objects.filter( manager=OuterRef("manager") ).order_by("-salary").values("id")[:1] )
Or a window function
Employee.objects.annotate( highest_paid_id=Window( FirstValue("id"), partition_by="manager", order_by="-salary", ) ).filter(id=F("highest_paid_id"))
Failing test