#17502 closed Cleanup/optimization (fixed)
Filter on field from base class 2 levels up hierarchy causes extra join
Reported by: | Owned by: | Łukasz Rekucki | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.3 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Here's some simple code showing a 3 level concrete inheritance structure:
from django.db import models class A(models.Model): alice = models.IntegerField() class B(A): bob = models.IntegerField() class C(B): charlie = models.IntegerField()
Filtering C objects on an attribute in B or C works fine. Filtering C objects on an attribute in A seems to cause an extra join to A. Is there some reason this is needed?
In [3]: str(C.objects.all().query) Out[3]: 'SELECT "case1_a"."id", "case1_a"."alice", "case1_b"."a_ptr_id", "case1_b"."bob", "case1_c"."b_ptr_id", "case1_c"."charlie" FROM "case1_c" INNER JOIN "case1_a" ON ("case1_c"."b_ptr_id" = "case1_a"."id") INNER JOIN "case1_b" ON ("case1_c"."b_ptr_id" = "case1_b"."a_ptr_id")' In [4]: str(C.objects.filter(charlie=0).query) Out[4]: 'SELECT "case1_a"."id", "case1_a"."alice", "case1_b"."a_ptr_id", "case1_b"."bob", "case1_c"."b_ptr_id", "case1_c"."charlie" FROM "case1_c" INNER JOIN "case1_a" ON ("case1_c"."b_ptr_id" = "case1_a"."id") INNER JOIN "case1_b" ON ("case1_c"."b_ptr_id" = "case1_b"."a_ptr_id") WHERE "case1_c"."charlie" = 0 ' In [5]: str(C.objects.filter(bob=0).query) Out[5]: 'SELECT "case1_a"."id", "case1_a"."alice", "case1_b"."a_ptr_id", "case1_b"."bob", "case1_c"."b_ptr_id", "case1_c"."charlie" FROM "case1_c" INNER JOIN "case1_b" ON ("case1_c"."b_ptr_id" = "case1_b"."a_ptr_id") INNER JOIN "case1_a" ON ("case1_c"."b_ptr_id" = "case1_a"."id") WHERE "case1_b"."bob" = 0 ' In [6]: str(C.objects.filter(alice=0).query) Out[6]: 'SELECT T4."id", T4."alice", "case1_b"."a_ptr_id", "case1_b"."bob", "case1_c"."b_ptr_id", "case1_c"."charlie" FROM "case1_c" INNER JOIN "case1_b" ON ("case1_c"."b_ptr_id" = "case1_b"."a_ptr_id") INNER JOIN "case1_a" ON ("case1_b"."a_ptr_id" = "case1_a"."id") INNER JOIN "case1_a" T4 ON ("case1_c"."b_ptr_id" = T4."id") WHERE "case1_a"."alice" = 0 '
This is with 1.3.1.
Change History (5)
comment:1 by , 13 years ago
Owner: | changed from | to
---|
comment:2 by , 13 years ago
Status: | new → assigned |
---|---|
Triage Stage: | Unreviewed → Accepted |
comment:3 by , 12 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|
comment:4 by , 12 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:5 by , 12 years ago
The original case is already fixed in master. However, the .values(from_top_only).filter(against_bottom) case does generate one extra join (assuming Top->Middle->Bottom inheritance chain). This is now expectedFailure, and I don't think it is too important to fix this one. Multilevel inheritance + select only from topmost models + filtering against bottom just doesn't seem like something worth using much time.
If this were to be fixed, it seems the way would be to do this (this is just another way to have "fake joining" as suggested by lrekucki):
- Alter the join reference counting in Query to count only those tables that are really used (that is, they are referenced by SELECT, ORDER BY, ...).
- When generating the query in compiler, check if there are joins that can be skipped (the "from" and "to" joins use the same column), and that have reference count of 0. If so, skip the join.
Of course, it is also possible to fix this by being clever in join generation stage. But it is hard to make all three following cases work correctly using this approach:
- Top.objects.values_list(from_top).filter(against_bottom)
- Top.objects.filter(against_bottom).values_list(from_top)
- Top.objects.filter(against_bottom)
If you don't skip the middle in case 2 you will have 2 joins. If you do skip joins, then case 3 will not work.
All in all, solving this properly requires too much effort compared to the gain.
Going step by step:
Query.setup_inherited_model()
addsINNER JOIN
for all parent models. C is joined with B [{{{ ON (C.b_ptr_id = B.a_ptr_id) - B's parent pointer is it's own primary key). Then C is joined with A and because of B's PK is the same as B's parent pointer, Django optimizes? by doing (C.b_ptr_id [= B.a_ptr_id] = A.id).Query.add_filter()
lookups thealice
field and setups neccesary joins withQuery.setup_joins()
. As the field resides on the base model, the following code is executed:This takes a different approach by joining C with B (alias lookup succeeds) and then B with A (alias lookup fails, because we previously used a different connection). This is where the additional join comes from.
Possible solutions:
setup_joins to
try to reuse the optimized alias first.setup_inherited_models()
, if we did join to B (i.e. there are selected fields to B), add a fake entry toalias_map
for "B INNER JOIN A", thatsetup_joins()
could reuse later.