Context Navigation

← Previous Ticket
Next Ticket →

#16554 closed Bug (invalid)

Unnecessary join when using a reverse foreign-key relationship in separate filter or aggregate calls

Reported by:	Ben Davis	Owned by:	nobody
Component:	Database layer (models, ORM)	Version:	dev
Severity:	Normal	Keywords:
Cc:	Ben Davis	Triage Stage:	Unreviewed
Has patch:	no	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	no
Easy pickings:	no	UI/UX:	no

Description

Django allows you to perform queries across reverse foreign key relationships. If, however, you need to access that same relationship in more than one filter or aggregate call, the ORM creates unnecessary joins.

Example

app/models.py:

class Author(models.Model):
    first_name = models.CharField(max_length=30)
    last_name = models.CharField(max_length=30)
    email = models.EmailField()

class Book(models.Model):
    author = models.ForeignKey(Author, related_name='books')
    title = models.CharField(max_length=100)
    genre = models.CharField(max_length=20, choices=(
        ('SCIFI', 'SciFi'),
        ('FANTASY', 'Fantasy'),
        ('NONFICTION', 'NonFiction')
    ))
    published = models.DateField()
    pages = models.IntegerField()

This particular query will only perform one join on app_books:

qs = Author.objects.filter(books__genre='SCIFI', books__pages__gt=500)
print qs.query

SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" 
FROM "app_author" 
INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
WHERE ("app_book"."genre" = 'SCIFI'  AND "app_book"."pages" > 500 )

However, if you separate the filter() call into two separate calls, you get this:

qs = Author.objects.filter(books__genre='SCIFI')
qs = qs.filter(books__pages__gt=500)
print qs.query

SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email" 
FROM "app_author" 
INNER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") 
WHERE ("app_book"."genre" = 'SCIFI'  AND T3."pages" > 500 )

As you can see, simply separating out the filters into separate calls creates a new, completely unecessary join.

This also occurs if you need to do an aggregate:

qs = Author.objects.annotate(pages_written=Sum('books__pages'))
qs = qs.filter(books__genre='SCIFI')
print qs.query

SELECT "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email", SUM("app_book"."pages") AS "pages_written" 
FROM "app_author" 
LEFT OUTER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") 
INNER JOIN "app_book" T3 ON ("app_author"."id" = T3."author_id") 
WHERE T3."genre" = 'SCIFI'
GROUP BY "app_author"."id", "app_author"."first_name", "app_author"."last_name", "app_author"."email"

In this particular case, I imagine the ORM is seeing the need for two joins because one is outer and one is inner, however, I would argue that if an outer join already occurs as a result of an aggregate, it should always use the existing join.

Change History (5)

comment:1 by Russell Keith-Magee, 15 years ago

Resolution:	→ invalid
Status:	new → closed

This is working as designed.

A single filter() clause is used to determine the join-sharing relationships in a query; Model.objects.filter(Q1, Q2) is not *necessarily* equivalent to Model.objects.filter(Q1).filter(Q2).

comment:2 by Ben Davis, 15 years ago

I would argue that this is still a bug and should be left open. There is no conceivable situation in which you would need both joins. An extra join can severely impact the performance of a query, especially when dealing with large datasets. The ORM should recognize these situations and optimize as needed by removing the unnecessary joins.

Sure, you can avoid the extra join in the multiple .filter() example, but there are some situations in which the extra join cannot be avoided, as illustrated above with the aggregate query.

Version 0, edited 15 years ago by Ben Davis (next)

comment:3 by Ben Davis, 15 years ago

Cc:	Ben Davis added

comment:4 by Anssi Kääriäinen, 15 years ago

If I am not mistaken, the aggregate example is valid - it produces wrong results. Author having two different SCIFI books will have pages calculated twice.

But the chained filters is not valid, as one query is asking for a model which has a book with genre of SCIFI and more than 500 pages, the other is asking for a model which has a book with genre of SCIFI and a book which has more than 500 pages. So, in the first example there must be one matching book, in the second example the books matching the condition can be different ones.

I am leaving this as closed, as I don't know what to do for a half-valid ticket and I haven't actually verified that the aggregate example produces wrong results.

comment:5 by Ben Davis, 15 years ago

akaari, thank you for clarifying that. I had to read your comment like five times, but that makes sense now. I opened #16603 for the aggregate bug.

Last edited 8 years ago by Tim Graham (previous) (diff)

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues

Context Navigation

#16554 closed Bug (invalid)

Unnecessary join when using a reverse foreign-key relationship in separate filter or aggregate calls

Description

Example

Change History (5)

comment:1 by Russell Keith-Magee, 15 years ago

comment:2 by Ben Davis, 15 years ago

comment:3 by Ben Davis, 15 years ago

comment:4 by Anssi Kääriäinen, 15 years ago

comment:5 by Ben Davis, 15 years ago

Download in other formats:

Django Links

Learn More

Get Involved

Get Help

Follow Us

Support Us