Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#27388 closed Uncategorized (invalid)

Filter chaining results in unnecessary joins (and degenerate performance)

Reported by: Connor Osborn Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: queryset, join, performance
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I found that filter(testA).filter(testB) produced very different sql than filter(testA & testB).

For my application I did a small test to reproduce this. In the example linked I print the sql formatted. The chained filters produce many unnecessary joins.
https://gist.github.com/cdosborn/cb4bdfd0467feaf987476f4aefdf7ee5

The joins are created in the first place because of the foreign key tags__name requires a join with a Tags table.

The popular django rest framework library provides a search mechanism that uses chained filters in django. Our application used that library and experienced an incredible slowdown. The many unnecessary joins resulted in a massive table (more than 1 million rows) when the original Application table had only 1000 rows.

I think the correct way forward is to make those two queries that are logically equivalent produce the same sql. I have done some digging in the code base and found a patch that fixes the sql but is not a proper fix (printed below). I'm willing to take up the work to fix this properly, but I will probably need some assistance.

There is only one use case for filter_is_sticky (located in model code) and it looks like a hack. If you look at the commit that introduced it, it was actually trying to solve my problem: unnecessary joins. So I'm guessing a proper fix would remove that.

diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py
index f82eca3..b54a8e5 100644
--- a/django/db/models/sql/query.py
+++ b/django/db/models/sql/query.py
@@ -314,10 +314,7 @@ class Query(object):
         obj.extra_tables = self.extra_tables
         obj.extra_order_by = self.extra_order_by
         obj.deferred_loading = copy.copy(self.deferred_loading[0]), self.deferred_loading[1]
-        if self.filter_is_sticky and self.used_aliases:
-            obj.used_aliases = self.used_aliases.copy()
-        else:
-            obj.used_aliases = set()
+        obj.used_aliases = self.used_aliases.copy()
         obj.filter_is_sticky = False
         if 'alias_prefix' in self.__dict__:
             obj.alias_prefix = self.alias_prefix

I tested code against 1.9 (because our app uses 1.9). It's possible this is fixed in master, though a quick glance at master made me think otherwise.

Change History (3)

comment:1 by Simon Charette, 7 years ago

Resolution: invalid
Status: newclosed

I think the correct way forward is to make those two queries that are logically equivalent produce the same sql.

Unfortunately this would be backward incompatible as this is a documented feature.

I recently suggested adding an API to allow alias reusing in a specific context. I suspect your use case is similar to the one reported in #27343 but related to DRF instead of the admin.

comment:2 by Connor Osborn, 7 years ago

That's pretty surprising to me, thanks for linking me to the docs. I'm still interested in finding a solution for my application. It looks like if I set the filter_is_sticky flag it gets reset to false in the next filter. So in order to have a chain each one must be add that flag, is that right?

comment:3 by Simon Charette, 7 years ago

It looks like if I set the filter_is_sticky flag it gets reset to false in the next filter. So in order to have a chain each one must be add that flag, is that right?

Exactly but relying on Django's internal APIs is risky as it might be replaced in future versions.

Note: See TracTickets for help on using tickets.
Back to Top