Opened 3 months ago

Closed 3 months ago

Last modified 3 months ago

#35437 closed Bug (needsinfo)

Exists subquery ignores ordering when it's distinct

Reported by: Kevin Marsh Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: subquery, exists, ordering, distinct
Cc: Kevin Marsh, Simon Charette Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I ran into a bug in the Exists expression, I'm using it to do some filtering where I just want the latest object grouped by a certain value (eg. latest financial statement object for each company). See the patch for a failing test, but using the Manager/Employee models from the test suite you can see in the slightly contrived example that for an Exists query like

# ... filtering by highest paid employee per manager
Exists(
  Employee.objects.order_by("manager", "-salary").distinct("manager").filter(pk=OuterRef("pk"))
)
# ...

Gets transformed into SQL (Postgresql) like this where the ordering has been stripped out

-- ...
EXISTS(
 SELECT DISTINCT ON (U0."manager_id")
        1 AS "a"
 FROM "expressions_employee" U0
 WHERE U0."id" = ("expressions_employee"."id")
 -- Missing ordering which is required for distinct
 LIMIT 1
)
-- ...

Obviously we want to call clear_ordering most of the time on Exists subqueries for performance reasons, but in this case we either shouldn't clear them or loudly raise an exception saying that distinct Exists queries are not supported

Attachments (1)

test_ordering_in_distinct_exists_subquery.2.diff (1.5 KB ) - added by Kevin Marsh 3 months ago.
Failing test

Download all attachments as: .zip

Change History (3)

by Kevin Marsh, 3 months ago

Failing test

comment:1 by Simon Charette, 3 months ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedBug

The report makes sense. We could go two ways here

  1. Clear distinct_fields as well as it's unnecessary (we only care about the existence of a single row after all)
  2. Avoid forcing the removal of ordering since the clear_ordering method is a noop when distinct fields are defined.

comment:2 by Simon Charette, 3 months ago

Cc: Simon Charette added
Resolution: needsinfo
Status: newclosed

Actually something is off in your test. Even if the ordering is not cleared all employees will match because you reduce the number of rows of the subquery by employee id

e.g.

SELECT "expressions_employee"."id",
       "expressions_employee"."firstname",
       "expressions_employee"."lastname",
       "expressions_employee"."salary",
       "expressions_employee"."manager_id",
       "expressions_employee"."based_in_eu"
FROM "expressions_employee"
WHERE (EXISTS
         (SELECT DISTINCT ON (U0."manager_id") 1 AS "a"
          FROM "expressions_employee" U0
          WHERE (U0."manager_id" IS NOT NULL
                 AND U0."id" = ("expressions_employee"."id"))
          ORDER BY U0."manager_id" ASC, U0."salary" DESC
          LIMIT 1)
       AND "expressions_employee"."manager_id" = 2)
LIMIT 21;

In other words, you only filter by U0."id" = ("expressions_employee"."id") so the current employee will always exists in the subquery and DISTINCT and ORDER BY have nothing to do with it.

Try it out for yourself with the following patch that avoids eliding the ordering part

  • django/db/models/sql/query.py

    diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
    index b3f130c0b4..07da7ae1a1 100644
    a b def exists(self, limit=True):  
    648648                combined_query.exists(limit=False)
    649649                for combined_query in q.combined_queries
    650650            )
    651         q.clear_ordering(force=True)
     651        q.clear_ordering()
    652652        if limit:
    653653            q.set_limits(high=1)
    654654        q.add_annotation(Value(1), "a")

This appears to be a misuse of EXISTS. You want to a subquery I suppose to achieve your purpose.

Employee.objects.filter(
    id__in=Employee.objects.filter(
        manager=OuterRef("manager")
    ).order_by("-salary").values("id")[:1]
)

Or a window function

Employee.objects.annotate(
    highest_paid_id=Window(
        FirstValue("id"),
        partition_by="manager",
        order_by="-salary",
    )
).filter(id=F("highest_paid_id"))
Last edited 3 months ago by Simon Charette (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top