#27388 closed Uncategorized (invalid)
Filter chaining results in unnecessary joins (and degenerate performance)
Reported by: | Connor Osborn | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.9 |
Severity: | Normal | Keywords: | queryset, join, performance |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I found that filter(testA).filter(testB)
produced very different sql than filter(testA & testB)
.
For my application I did a small test to reproduce this. In the example linked I print the sql formatted. The chained filters produce many unnecessary joins.
https://gist.github.com/cdosborn/cb4bdfd0467feaf987476f4aefdf7ee5
The joins are created in the first place because of the foreign key tags__name
requires a join with a Tags table.
The popular django rest framework library provides a search mechanism that uses chained filters in django. Our application used that library and experienced an incredible slowdown. The many unnecessary joins resulted in a massive table (more than 1 million rows) when the original Application table had only 1000 rows.
I think the correct way forward is to make those two queries that are logically equivalent produce the same sql. I have done some digging in the code base and found a patch that fixes the sql but is not a proper fix (printed below). I'm willing to take up the work to fix this properly, but I will probably need some assistance.
There is only one use case for filter_is_sticky
(located in model code) and it looks like a hack. If you look at the commit that introduced it, it was actually trying to solve my problem: unnecessary joins. So I'm guessing a proper fix would remove that.
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index f82eca3..b54a8e5 100644 --- a/django/db/models/sql/query.py +++ b/django/db/models/sql/query.py @@ -314,10 +314,7 @@ class Query(object): obj.extra_tables = self.extra_tables obj.extra_order_by = self.extra_order_by obj.deferred_loading = copy.copy(self.deferred_loading[0]), self.deferred_loading[1] - if self.filter_is_sticky and self.used_aliases: - obj.used_aliases = self.used_aliases.copy() - else: - obj.used_aliases = set() + obj.used_aliases = self.used_aliases.copy() obj.filter_is_sticky = False if 'alias_prefix' in self.__dict__: obj.alias_prefix = self.alias_prefix
I tested code against 1.9 (because our app uses 1.9). It's possible this is fixed in master, though a quick glance at master made me think otherwise.
Change History (3)
comment:1 by , 8 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
comment:2 by , 8 years ago
That's pretty surprising to me, thanks for linking me to the docs. I'm still interested in finding a solution for my application. It looks like if I set the filter_is_sticky
flag it gets reset to false in the next filter. So in order to have a chain each one must be add that flag, is that right?
comment:3 by , 8 years ago
It looks like if I set the filter_is_sticky flag it gets reset to false in the next filter. So in order to have a chain each one must be add that flag, is that right?
Exactly but relying on Django's internal APIs is risky as it might be replaced in future versions.
Unfortunately this would be backward incompatible as this is a documented feature.
I recently suggested adding an API to allow alias reusing in a specific context. I suspect your use case is similar to the one reported in #27343 but related to DRF instead of the admin.