﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
25789	Inefficient Queries Generated due to not using WHERE EXISTS	Alex Rothberg	nobody	"[http://stackoverflow.com/questions/33764737/django-equivalent-of-sqlalchemy-any-to-filter-where-exists/33765323 Reposting question from SO] with some more details.

I believe that the Django ORM is generating seriously inefficient SQL due to it not using `WHERE EXISTS` but instead using a `DISTINCT` with a `LEFT JOIN`. by comparison, SQLAlchemy will use `WHERE EXISTS`.

I have two models, `Exam` and `Series`. `Series` objects have a foreign key to an `Exam` object. Both of the models contain a field `description_user`. I am trying to search for all `Exam`s that have a search term in `description_user` or have a child `Series` with that term in its `description_user`. I want to do this for a number of search terms (requiring all of them). I also want to de-duplicate the results (ie not get the same Exam multiple times).

This is roughly what the filter looks like:

{{{
a = (Q(**{'series__description_user__icontains': 'bar'}) | Q(**{'description_user__icontains': 'bar'})) 
b = (Q(**{'series__description_user__icontains': 'foo'}) | Q(**{'description_user__icontains': 'foo'}))
c = (Q(**{'series__description_user__icontains': 'baz'}) | Q(**{'description_user__icontains': 'baz'}))
Exam.objects.filter(a & b & c).distinct()
}}}

with corresponding SQL:

{{{
SELECT DISTINCT 
                ""exam_storage_exam"".""id"", 
                ""exam_storage_exam"".""description_user""
FROM            ""exam_storage_exam"" 
LEFT OUTER JOIN ""exam_storage_series"" 
ON              ( 
                                ""exam_storage_exam"".""id"" = ""exam_storage_series"".""exam_id"" 
                AND             ( 
                                                ""exam_storage_series"".""removed"" IS NULL) ) 
WHERE           ( 
                                ""exam_storage_exam"".""removed"" IS NULL 
                AND             ( 
                                                ""exam_storage_series"".""description_user"" LIKE %s ESCAPE \'\\\'
                                OR              ""exam_storage_exam"".""description_user"" LIKE %s ESCAPE \'\\\')
                AND             ( 
                                                ""exam_storage_series"".""description_user"" LIKE %s ESCAPE \'\\\'
                                OR              ""exam_storage_exam"".""description_user"" LIKE %s ESCAPE \'\\\')
                AND             ( 
                                                ""exam_storage_series"".""description_user"" LIKE %s ESCAPE \'\\\'
                                OR              ""exam_storage_exam"".""description_user"" LIKE %s ESCAPE \'\\\'))

}}}

The issue is that as the number of search terms grows, the size of the intermediate data set before the DISTINCT operation grows as well.

Ideally the SQL would look like:
{{{
SELECT *                                                
 FROM exam                                             
   WHERE (EXISTS (SELECT 1             
        FROM exam_storage_series              
          WHERE exam.id = series.exam_id AND (
            series.description_user LIKE '%foo%'
          )) or exam.description_user LIKE '%foo%') AND 
(EXISTS (SELECT 1             
        FROM exam_storage_series              
          WHERE exam.id = series.exam_id AND (
            series.description_user LIKE '%bar%'
          )) or exam.description_user LIKE '%bar%') AND 
(EXISTS (SELECT 1             
        FROM exam_storage_series              
          WHERE exam.id = series.exam_id AND (
            series.description_user LIKE '%baz%'
          )) or exam.description_user LIKE '%baz%')
}}}

Currently the performance of Django query is terrible. This style searching comes up for example in how [https://github.com/tomchristie/django-rest-framework/blob/43c45cc9391ec2bed9481a8b309990dec35b6ac8/rest_framework/filters.py#L132-L180 DRF generates search queries]."	Cleanup/optimization	new	Database layer (models, ORM)	1.8	Normal		QuerySet.extra	Anssi Kääriäinen Marc Tamlyn James Howe David Wobrock John Speno	Accepted	0	0	0	0	0	0
