Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#16554 closed Bug (invalid)

Unnecessary join when using a reverse foreign-key relationship in separate filter or aggregate calls

Reported by: bendavis78 Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: bendavis78 Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Django allows you to perform queries across reverse foreign key relationships. If, however, you need to access that same relationship in more than one filter or aggregate call, the ORM creates unnecessary joins.

Example

app/models.py:

class Author(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
    email = models.EmailField()

class Book(models.Model):
    author = models.ForeignKey(Author, related_name='books')
    title = models.CharField(max_length=100)
    genre = models.CharField(max_length=20, choices=(
        ('SCIFI', 'SciFi'),
        ('FANTASY', 'Fantasy'),
        ('NONFICTION', 'NonFiction')
    ))
    published = models.DateField()
    pages = models.IntegerField()

This particular query will only perform one join on app_books:

qs = Author.objects.filter(books__genre='SCIFI', books__pages__gt=500)
print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" 
FROM "app_author" 
INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
WHERE ("app_book"."genre" = 'SCIFI'  AND "app_book"."pages" > 500 )

However, if you separate the filter() call into two separate calls, you get this:

qs = Author.objects.filter(books__genre='SCIFI')
qs = qs.filter(books__pages__gt=500)
print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" 
FROM "app_author" 
INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") 
WHERE ("app_book"."genre" = 'SCIFI'  AND T3."pages" > 500 )

As you can see, simply separating out the filters into separate calls creates a new, completely unecessary join.

This also occurs if you need to do an aggregate:

qs = Author.objects.annotate(pages_written=Sum('books__pages'))
qs = qs.filter(books__genre='SCIFI')
print qs.query
SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email", SUM("app_book"."pages") AS "pages_written" 
FROM "app_author" 
LEFT OUTER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") 
WHERE T3."genre" = 'SCIFI'
GROUP BY "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email"

In this particular case, I imagine the ORM is seeing the need for two joins because one is outer and one is inner, however, I would argue that if an outer join already occurs as a result of an aggregate, it should always use the existing join.

Attachments (0)

Change History (5)

comment:1 Changed 3 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to invalid
  • Status changed from new to closed

This is working as designed.

A single filter() clause is used to determine the join-sharing relationships in a query; Model.objects.filter(Q1, Q2) is not *necessarily* equivalent to Model.objects.filter(Q1).filter(Q2).

comment:2 Changed 3 years ago by bendavis78

I would argue that this is still a bug and should be left open. There is no conceivable situation in which you would need both joins. An extra join can severely impact the performance of a query, especially when dealing with large datasets. The ORM should recognize these situations and optimize as needed by removing the unnecessary joins.

Sure, you can avoid the extra join in the multiple .filter() example, but there are some situations in which it cannot be avoided, as illustrated above with the aggregate query.

Last edited 3 years ago by bendavis78 (previous) (diff)

comment:3 Changed 3 years ago by bendavis78

  • Cc bendavis78 added

comment:4 Changed 3 years ago by akaariai

If I am not mistaken, the aggregate example is valid - it produces wrong results. Author having two different SCIFI books will have pages calculated twice.

But the chained filters is not valid, as one query is asking for a model which has a book with genre of SCIFI and more than 500 pages, the other is asking for a model which has a book with genre of SCIFI and a book which has more than 500 pages. So, in the first example there must be one matching book, in the second example the books matching the condition can be different ones.

I am leaving this as closed, as I don't know what to do for a half-valid ticket and I haven't actually verified that the aggregate example produces wrong results.

comment:5 Changed 3 years ago by bendavis78

akaari, thank you for clarifying that. I had to read your comment like five times, but that makes sense now. I'll open another ticket for the aggregate bug.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.