#27260 closed Cleanup/optimization (invalid)
Performance Issue because of LEFT OUTER JOIN instead the better INNER JOIN
Reported by: | Sven R. Kunze | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.8 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Original Query:
BigTable.objects.filter( Q(tablea__item_id__in=my_items) | Q(tableb__item_id__in=my_items) )
resulting in:
SELECT * FROM "big_table" LEFT OUTER JOIN "tablea" ON ("big_table"."id" = "tablea"."big_table_id") LEFT OUTER JOIN "tableb" ON ("big_table"."id" = "tableb"."big_table_id") WHERE ("tablea"."item_id" IN (<handful of items>) OR "tableb"."item_id" IN (<handful of items>))
The LEFT OUTER JOIN
combined with a growing number of rows in big_table results in very slow execution of the query.
We temporarily rewrote the query by using two separate queries and merging the sets via Python:
BigTable.objects.filter( Q(tablea__item_id__in=my_items) )
SELECT * FROM "big_table" INNER JOIN "tablea" ON ("big_table"."id" = "tablea"."big_table_id") WHERE "tablea"."item_id" IN (<handful of items>)
and
BigTable.objects.filter( Q(tableb__item_id__in=my_items) )
SELECT * FROM "big_table" INNER JOIN "tableb" ON ("big_table"."id" = "tableb"."big_table_id") WHERE "tableb"."item_id" IN (<handful of items>)
Ideally, we would like to use the original query because it is clearer and less code to maintain. If you want to see the models, I can post them as well. Ideally again, we don't actually expect models to influence the type of the joins.
We want BigTable to be filtered, so, if there's no appropriate match on the TableA or TableB, the BigTable item simply does not appear in the result set. However, LEFT OUTER JOIN
would result in such a row which then needs to be removed afterwards laboriously (at least PostgreSQL struggles to do so). If the original query had used INNER JOIN
, the query would have been fast.
Change History (10)
follow-up: 2 comment:1 by , 8 years ago
comment:2 by , 8 years ago
Replying to Tim Graham:
Please also provide your models.
class BigTable: # a lot of unrelated fields
class TableA: big_table = models.OneToOneField(BigTable, related_name='...', primary_key=True) item_id = models.CharField(max_length=1024)
class TableB: big_table = models.ForeignKey(BigTable, related_name='...') item_id = models.CharField(max_length=1024)
I'm not certain whether or not your proposed changes are correct. If you can provide a patch so we can check whether or not it breaks existing behavior, it'll be more easily to evaluate the idea.
Neither are we. Maybe, a UNION
could also work. The real issue we have is the slow execution of the original query.
comment:3 by , 8 years ago
FYI, if you're not sure it's a bug, you should use our support channels before opening a ticket.
comment:4 by , 8 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
I think Ansii's write up on join promotion explains why LEFT OUTER JOIN
is required here.
comment:5 by , 8 years ago
Thanks, Tim. You are right, that OUTER LEFT JOIN
is required here.
However, I might state again that our real problem is the performance issue. As we don't know who's responsible in the end, I simultaneously posted a question on the PostgreSQL mailing list:
https://www.postgresql.org/message-id/flat/97dbd6f9-a289-94d1-325f-997df87c672a%40mail.de
So, it seems as if Django could also solve it by using UNION or generating subquery plans.
comment:6 by , 8 years ago
I haven't worked with the ORM enough to know if changing the query operator is feasible.
comment:7 by , 8 years ago
@Tim
On the one hand, I somehow doubt that it's Django responsibility to do it (cf. https://www.postgresql.org/message-id/8ccdd027-f4a0-2e0d-1e26-2a403fed8d17%40mail.de ).
On the other hand, there's already an asymmetry between a single "filter" and an "OR filter". The former uses "INNER JOIN", the latter uses "OUTER JOIN".
comment:8 by , 8 years ago
Just for the records. We have performance issues on postgres because some SQL statements have up to six OUTER joins.
Union and intersection would be easier to read for me (human being) and maybe better to optimize for the postgres query planer.
comment:9 by , 8 years ago
Support for QuerySet.union()
, intersection()
, difference()
was added in #27718.
comment:10 by , 8 years ago
Just for the records, we use this pattern until we update to Django 1.11:
Use case: Select users which have permission "some_app.permission_name" and which are in group "special_group"
user_ids_with_group = User.objects.filter(groups__in=[special_group]).distinct().values_list('pk', flat=True) user_ids_with_permission = User.objects.filter(authutils.users_with_perm_q('some_app.permission_name')).distinct().values_list('pk', flat=True) qs = User.objects.filter(pk__in=set(user_ids_with_group).intersection(user_ids_with_permission)).distinct().order_by('last_name', 'first_name') # authutils def users_with_perm_q(perm, is_active=True, is_superuser=True): try: app_label, codename = perm.split('.') except ValueError: raise ValueError('Permission name should be in the form "app_label.perm_name".') permission = Permission.objects.get(codename=codename, content_type__app_label=app_label) user_q = Q(user_permissions=permission) group_q = Q(groups__permissions=permission) return Q(is_active=is_active) & (Q(is_superuser=is_superuser) | user_q | group_q)
Please also provide your models.
I'm not certain whether or not your proposed changes are correct. If you can provide a patch so we can check whether or not it breaks existing behavior, it'll be more easily to evaluate the idea.
Looking for possibly related tickets, I found #8439.