Opened 9 years ago

Closed 7 years ago

Last modified 6 years ago

#24254 closed Bug (fixed)

Queries using distinct() and order_by() cannot be used with the __in lookup

Reported by: Torsten Bronger Owned by: Simon Charette
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: bronger@…, Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Torsten Bronger)

Consider the following model:

from django.db import models

class MyModel(models.Model):
    name = models.CharField(max_length=200)

    class Meta:
        ordering = ("name",)

Then, the following query:

MyModel.objects.filter(pk__in=MyModel.objects.all().distinct()[:10])

results in an invalid SQL query. SQLite complains with "only a single result allowed for a SELECT that is part of an expression".

Change History (13)

comment:1 by Peter Inglesby, 9 years ago

Have had a little dig into this.

Consider an __in lookup with the following on the RHS: M.objects.order_by(sort_field).distinct()[m:n].

This generates a query that returns both M's primary key and sort_field. Because it returns two fields, it cannot be used as a subquery (at least in SQLite) and so we hit the reported error.

I think this behaviour is related to a warning in the docs: "Any fields used in an order_by() call are included in the SQL SELECT columns" (although from reading the code for SQLCompiler.get_extra_select I think that the fields used in an order_by() call are only included in the SELECT columns when the query has distinct set).

It's not obvious to me why any fields used in an order_by() call must always be included in the SELECT columns, so I'd like to propose a fix whereby SQLCompiler.get_extra_select returns an empty list if it's dealing with a subquery. I've made that change locally, and nothing fails. However, I don't know enough about the ORM to know whether this is a good idea! I hope this is useful.

comment:2 by Torsten Bronger, 9 years ago

Description: modified (diff)

You're right, my example was not minimal. I changed it so that only one model is defined.

comment:3 by Peter Inglesby, 9 years ago

Has patch: set

I've added a patch with a fix and a test at https://github.com/django/django/pull/4078.

comment:4 by Peter Inglesby, 9 years ago

Triage Stage: UnreviewedAccepted

comment:5 by Tim Graham, 9 years ago

Patch needs improvement: set

comment:6 by Peter Inglesby, 9 years ago

The proposed patch passes on SQLite, but fails on Postgres, with:

for SELECT DISTINCT, ORDER BY expressions must appear in select list LINE 1: ...eries_person"."id" FROM "queries_person" ORDER BY "queries_p... 

and on MySQL, with:

This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'

I'm afraid I don't know enough about either database to know what the right fix is.

comment:7 by Raphael Gaschignard, 8 years ago

I've hit this issue as well, reaching an identical situation of __in + ordering + distinct + slicing.

Typing this out, I realised that if you have ordering and slicing, then you'll need the values in the SELECT (see the Postgres error mentioned before).

So I would propose a "tuple unwrapping" in that case. So the pk__in=qs property would expand to SQL like SELECT U1."id" FROM (SELECT DISTINCT ... ) U1

In [61]: print MyModel.objects.filter(pk__in=MyModel.objects.all().distinct()[:10]).query
# invalid syntax due to missing tuple unwrapping but includes proper ordering
SELECT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel"
 WHERE "test_mymodel"."id" IN 
  (SELECT DISTINCT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel" ORDER BY "test_mymodel"."name" ASC LIMIT 10) 
ORDER BY "test_mymodel"."name" ASC

# proposed solution with tuple unwrapping
SELECT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel" WHERE "test_mymodel"."id" IN 
(SELECT U0."id" FROM 
  (SELECT DISTINCT "test_mymodel"."id", "test_mymodel"."name" FROM "test_mymodel" ORDER BY "test_mymodel"."name" ASC LIMIT 10) U0
)
 ORDER BY "test_mymodel"."name" ASC

I think we could always do wrapping and unwrapping of __in subqueries. The cost is pretty negligible in Postgres (an extra select node), and avoids having to figure out the innards of the subquery select.

comment:8 by Simon Charette, 8 years ago

Cc: Simon Charette added
Version: 1.7master

We should be able to avoid unconditional tuple wrapping as it only seems to be required when DISTINCT is used with ORDER BY.

comment:9 by Simon Charette, 7 years ago

Owner: changed from nobody to Simon Charette
Patch needs improvement: unset
Status: newassigned
Summary: Passing a query to "pk__in" of another query may result in invalid SQLQueries using distinct() and order_by() cannot be used with the __in lookup

comment:10 by Tim Graham, 7 years ago

Triage Stage: AcceptedReady for checkin

(pending some cosmetic updates)

comment:11 by Simon Charette <charette.s@…>, 7 years ago

Resolution: fixed
Status: assignedclosed

In 4acae218:

Fixed #24254 -- Fixed queries using the in lookup with querysets using distinct() and order_by().

Thanks Tim for the review.

comment:12 by Tim Graham <timograham@…>, 6 years ago

In 11ade8e:

Refs #24254 -- Removed unnecessary SQL AS clause in SQLCompiler.as_sql().

Incorrect on Oracle.

comment:13 by Tim Graham <timograham@…>, 6 years ago

In 71da2b4f:

[2.0.x] Refs #24254 -- Removed unnecessary SQL AS clause in SQLCompiler.as_sql().

Incorrect on Oracle.

Backport of 11ade8eefd32f5bc7ee6379b77824f02ca61c20b from master

Note: See TracTickets for help on using tickets.
Back to Top