﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
17276	Slow anti-join query against Postgres	dmitry@…	nobody	"For background on this ticket please see the following two discussions:

 * http://stackoverflow.com/questions/8190474/postgres-slow-outer-query-when-using-a-primary-key
 * https://groups.google.com/group/django-users/browse_thread/thread/7d13f2d8748b4f9f

Basically, in a many-to-many mappings between models Student and Course, if I want to find all instances of Students that aren't registered for classes, I would issue the following Django query:

{{{
Student.objects.filter(course__isnull=True)
}}}

Django translates this into the following query:
{{{
SELECT ""student"".""id"", ""student"".""name"" 
   FROM ""student"" 
      LEFT OUTER JOIN ""course_students"" 
         ON (""student"".""id"" = ""course_students"".""student_id"") 
      LEFT OUTER JOIN ""course"" 
         ON (""course_students"".""course_id"" = ""course"".""id"") 
   WHERE ""course"".""id"" IS NULL
}}}
The problem is that the way the WHERE clause is generated is very inefficient (at least when used with Postgres). Changing WHERE to ""course_students"".""student_id"" IS NOT NULL yields orders of magnitude improved query plan.  Here's the difference I'm seeing on real data:

* Django way: (cost=1072479.36..6256437.79 rows=1 width=145) 
* Hand-crafted: (cost=1518.71..1533.35 rows=1 width=145) 

I'm attaching a sample project with the model already set up.  To see the generated SQL query, simply run ""python manage.py anti-join."""	Cleanup/optimization	closed	Database layer (models, ORM)	1.3	Normal	fixed	orm, postgres, join, anti-join		Accepted	0	0	0	0	0	0
