Opened 9 years ago

Last modified 6 years ago

#24218 new New feature

Use sub-query in ORM when distinct and order_by columns do not match — at Version 1

Reported by: Miroslav Shubernetskiy Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: subquery distinct order_by
Cc: Harro Triage Stage: Accepted
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Miroslav Shubernetskiy)

This ticket is to propose a slight change in ORM - use subqueries when querying a model where .distinct() and .order_by() (or .extra(order_by=())) leftmost columns do not match. For example:

Model.objects.all().distinct('foo').order_by('bar')

The above generates the following SQL:

SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
FROM "app_model"
ORDER BY "app_model"."bar" ASC;

I am not sure about all backends however the above syntax is not allowed in PostgreSQL which produces the following error:

ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY expressions

Here are PostgreSQL docs explaining why that is not allowed:

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. [...] Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. [...] The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s).

This ticket proposes to use subqueries in such situations which would use SQL:

SELECT *
FROM (
  SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
  FROM "app_model"
) result
ORDER BY "app_model"."bar" ASC;

The above is perfectly valid SQL and produces expected results (please note that ORDER_BY is in the outer query to guarantee that distinct results are correctly sorted).

I created a simple proof-of-concept patch by overwriting few things in SQLCompiler.as_sql() which seems to work pretty well. The patch only creates subquery when the above dilema is encountered which should not have any negative side-effects on existing queries (since such queries were not allowed by SQL). The patch also works on the .count() queries since Django then strips any ordering hence the subquery is never created.

Change History (1)

comment:1 by Miroslav Shubernetskiy, 9 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top