Opened 13 years ago
Last modified 4 years ago
#17276 closed Cleanup/optimization
Slow anti-join query against Postgres — at Initial Version
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | orm, postgres, join, anti-join |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
For background on this ticket please see the following two discussions:
- http://stackoverflow.com/questions/8190474/postgres-slow-outer-query-when-using-a-primary-key
- https://groups.google.com/group/django-users/browse_thread/thread/7d13f2d8748b4f9f
Basically, in a many-to-many mappings between models Student and Course, if I want to find all instances of Students that aren't registered for classes, I would issue the following Django query:
Student.objects.filter(courseisnull=True)
Django translates this into the following query:
SELECT "student"."id", "student"."name"
FROM "student"
LEFT OUTER JOIN "course_students"
ON ("student"."id" = "course_students"."student_id")
LEFT OUTER JOIN "course"
ON ("course_students"."course_id" = "course"."id")
WHERE "course"."id" IS NULL
The problem is that the way the WHERE clause is generated is very inefficient (at least when used with Postgres). Changing WHERE to "course_students"."student_id" IS NOT NULL yields orders of magnitude improved query plan. Here's the difference I'm seeing on real data:
Django way: (cost=1072479.36..6256437.79 rows=1 width=145)
Hand-crafted: (cost=1518.71..1533.35 rows=1 width=145)
I'm attaching a sample project with the model already set up. To see the generated SQL query, simply run "python manage.py anti-join."
sample project