#16554 closed Bug (invalid)
Unnecessary join when using a reverse foreign-key relationship in separate filter or aggregate calls
Reported by: | Ben Davis | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Ben Davis | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Django allows you to perform queries across reverse foreign key relationships. If, however, you need to access that same relationship in more than one filter or aggregate call, the ORM creates unnecessary joins.
Example
app/models.py:
class Author(models.Model): first_name = models.CharField(max_length=30) last_name = models.CharField(max_length=30) email = models.EmailField() class Book(models.Model): author = models.ForeignKey(Author, related_name='books') title = models.CharField(max_length=100) genre = models.CharField(max_length=20, choices=( ('SCIFI', 'SciFi'), ('FANTASY', 'Fantasy'), ('NONFICTION', 'NonFiction') )) published = models.DateField() pages = models.IntegerField()
This particular query will only perform one join on app_books:
qs = Author.objects.filter(books__genre='SCIFI', books__pages__gt=500) print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" FROM "app_author" INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") WHERE ("app_book"."genre" = 'SCIFI' AND "app_book"."pages" > 500 )
However, if you separate the filter() call into two separate calls, you get this:
qs = Author.objects.filter(books__genre='SCIFI') qs = qs.filter(books__pages__gt=500) print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" FROM "app_author" INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") WHERE ("app_book"."genre" = 'SCIFI' AND T3."pages" > 500 )
As you can see, simply separating out the filters into separate calls creates a new, completely unecessary join.
This also occurs if you need to do an aggregate:
qs = Author.objects.annotate(pages_written=Sum('books__pages')) qs = qs.filter(books__genre='SCIFI') print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email", SUM("app_book"."pages") AS "pages_written" FROM "app_author" LEFT OUTER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") WHERE T3."genre" = 'SCIFI' GROUP BY "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email"
In this particular case, I imagine the ORM is seeing the need for two joins because one is outer and one is inner, however, I would argue that if an outer join already occurs as a result of an aggregate, it should always use the existing join.
Change History (5)
comment:1 by , 13 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 13 years ago
I would argue that this is still a bug and should be left open. There is no conceivable situation in which you would need both joins. An extra join can severely impact the performance of a query, especially when dealing with large datasets. The ORM should recognize these situations and optimize as needed by removing the unnecessary joins.
Sure, you can avoid the extra join in the multiple .filter() example, but there are some situations in which the extra join cannot be avoided, as illustrated above with the aggregate query.
comment:3 by , 13 years ago
Cc: | added |
---|
comment:4 by , 13 years ago
If I am not mistaken, the aggregate example is valid - it produces wrong results. Author having two different SCIFI books will have pages calculated twice.
But the chained filters is not valid, as one query is asking for a model which has a book with genre of SCIFI and more than 500 pages, the other is asking for a model which has a book with genre of SCIFI and a book which has more than 500 pages. So, in the first example there must be one matching book, in the second example the books matching the condition can be different ones.
I am leaving this as closed, as I don't know what to do for a half-valid ticket and I haven't actually verified that the aggregate example produces wrong results.
comment:5 by , 13 years ago
akaari, thank you for clarifying that. I had to read your comment like five times, but that makes sense now. I opened #16603 for the aggregate bug.
This is working as designed.
A single filter() clause is used to determine the join-sharing relationships in a query; Model.objects.filter(Q1, Q2) is not *necessarily* equivalent to Model.objects.filter(Q1).filter(Q2).