Opened 10 years ago

Closed 8 years ago

Last modified 7 years ago

#23518 closed Cleanup/optimization (fixed)

Subquery does not respect explicit __exact

Reported by: john-parton Owned by: nobody
Component: Database layer (models, ORM) Version: 1.7
Severity: Normal Keywords: subquery subselect exact
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I think it should be possible to use a subquery as an expression with __exact.

It appears that when you perform a subquery, the ORM includes an implicit __in. This is fine, because I'm sure that covers most use cases, but if you try to provide an explicit __exact, the ORM disregards it.

# models.py 
from django.db import models
    
class Child(models.Model):
    parent = models.ForeignKey('Parent')
    
class Parent(models.Model):
    slug = models.SlugField(unique=True)
In [1]: from model_test.models import Child, Parent

In [2]: parent = Parent(slug='test')

In [3]: parent.save()

In [4]: child = Child(parent=parent)

In [5]: child.save()

In [6]: children = Child.objects.filter(parent_id=Parent.objects.filter(slug='test'))

In [7]: print(children.query)
SELECT "model_test_child"."id", "model_test_child"."parent_id" FROM "model_test_child" WHERE "model_test_child"."parent_id" IN (SELECT U0."id" FROM "model_test_parent" U0 WHERE U0."slug" = test)

In [8]: children = Child.objects.filter(parent_id__exact=Parent.objects.filter(slug='test'))

In [9]: print(children.query)
SELECT "model_test_child"."id", "model_test_child"."parent_id" FROM "model_test_child" WHERE "model_test_child"."parent_id" IN (SELECT U0."id" FROM "model_test_parent" U0 WHERE U0."slug" = test)

The expected output of the 9th command is:

SELECT "model_test_child"."id", "model_test_child"."parent_id" FROM "model_test_child" WHERE "model_test_child"."parent_id" = (SELECT U0."id" FROM "model_test_parent" U0 WHERE U0."slug" = 'test');

Obviously using __exact isn't always possible, but because Parent has a unique slug, we can guarantee that the subquery returns one or zero rows. Alternative, we could have added a LIMIT 1 (or a [:1] with the ORM) to ensure that it has exactly one or zero rows.

The most simple way to do error handling would to just wrap the ProgrammingError: more than one row returned by a subquery used as an expression which would get raised by the connection.

At the very least, shouldn't the __exact transform raise some error or issue some warning that it does nothing?

Thoughts?

Change History (7)

comment:1 by Tim Graham, 10 years ago

If you expect the query to return one result, why not use .get() instead of .filter(). Also couldn't you restructure your example more simply as Parent.objects.filter(parent__slug='test'). I guess some documentation could at least be updated.

comment:2 by Thomas C, 10 years ago

It looks like the ORM automatically considers that you are doing an __in lookup when you give a QuerySet as the right-hand side of a filter on a related field, whatever the lookup type is (see https://github.com/django/django/blob/bbc3505ef81768aa2afac8f73e6d45b5e8000c55/django/db/models/fields/related.py#L1607).

For instance:

Author.objects.filter(item__startswith=Item.objects.all())
[<Author: a1>, <Author: a2>, <Author: a2>, <Author: a4>]

Maybe it would be more consistent if the ORM raised some error instead ?

That being said, I agree that .get() is better when you expect a single result.

Last edited 10 years ago by Thomas C (previous) (diff)

comment:3 by Tim Graham, 10 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

comment:4 by Tim Graham, 8 years ago

Resolution: fixed
Status: newclosed

Implicit __in lookups are removed in Django 1.9 (#25284) so this look like it's fixed. The idea to raise an error as proposed in comment 2 is tracked in #25298.

comment:5 by john-parton, 8 years ago

Thanks for the feedback.

For anyone who might come back to this ticket at a later date:

The reason someone might not want to use .get() is because it executes another SQL query.

Parent.objects.filter(parent__slug='test') is not the same query. It uses a join instead of a subquery, and might have different performance characteristics. Often I refactor a SQL query looking at the output of EXPLAIN, and then try to recreate the query using the ORM.

in reply to:  5 comment:6 by Shai Berger, 8 years ago

Replying to john-parton:

Parent.objects.filter(parent__slug='test') is not the same query. It uses a join instead of a subquery, and might have different performance characteristics. Often I refactor a SQL query looking at the output of EXPLAIN, and then try to recreate the query using the ORM.

Whether a specific ORM expression uses a join or a subquery is an implementation detail which you should not count on; these things are allowed to change (and have changed in the past) without notice, so long as the semantics of the query is preserved. The performance characteristics of a given SQL query, as well, may change between different databases and even between versions of the same database product. If your requirements really dictate that you use one form rather than the other, you should be using raw SQL.

comment:7 by john-parton, 7 years ago

I think with the release of version 1.11, it is now possible to do this without using raw SQL.

https://docs.djangoproject.com/en/1.11/releases/1.11/#subquery-expressions

Last edited 7 years ago by john-parton (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top