#36552 closed Cleanup/optimization (invalid)
Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs
Reported by: | Márton Salomváry | Owned by: | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 5.0 |
Severity: | Normal | Keywords: | chained-filters |
Cc: | Simon Charette | Triage Stage: | Unreviewed |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
We stumbled upon this issue while upgrading a large code base from Django 4.x to 5.0.
A reasonably simplified (probably not the most trivial) reproduction case looks like this:
Alpha.objects.filter( bravos__charlie_bravos__charlie=9999, ) .filter( Exists( Delta.objects.filter( bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"), ) ) ) .values_list("pk")
In Django 4 the generated query is:
SELECT "example_alpha"."id" FROM "example_alpha" INNER JOIN "example_bravo_alphas" ON ("example_alpha"."id" = "example_bravo_alphas"."alpha_id") INNER JOIN "example_bravo" ON ("example_bravo_alphas"."bravo_id" = "example_bravo"."id") INNER JOIN "example_charliebravo" ON ("example_bravo"."id" = "example_charliebravo"."bravo_id") WHERE ("example_charliebravo"."charlie_id" = 9999 AND EXISTS (SELECT 1 AS "a" FROM "example_delta" U0 INNER JOIN "example_delta_bravos" U1 ON (U0."id" = U1."delta_id") WHERE U1."bravo_id" = ("example_charliebravo"."bravo_id") LIMIT 1))
In Django 5:
SELECT "example_alpha"."id" FROM "example_alpha" INNER JOIN "example_bravo_alphas" ON ("example_alpha"."id" = "example_bravo_alphas"."alpha_id") INNER JOIN "example_bravo" ON ("example_bravo_alphas"."bravo_id" = "example_bravo"."id") INNER JOIN "example_charliebravo" ON ("example_bravo"."id" = "example_charliebravo"."bravo_id") LEFT OUTER JOIN "example_bravo_alphas" T6 ON ("example_alpha"."id" = T6."alpha_id") LEFT OUTER JOIN "example_bravo" T7 ON (T6."bravo_id" = T7."id") LEFT OUTER JOIN "example_charliebravo" T8 ON (T7."id" = T8."bravo_id") WHERE ("example_charliebravo"."charlie_id" = 9999 AND EXISTS (SELECT 1 AS "a" FROM "example_delta" U0 INNER JOIN "example_delta_bravos" U1 ON (U0."id" = U1."delta_id") WHERE U1."bravo_id" = (T8."bravo_id") LIMIT 1))
Interestingly, this QuerySet generates the expected query both in 4.x and 5.0:
Alpha.objects .filter( Q(bravos__charlie_bravos__charlie=9999) & Exists( Delta.objects.filter( bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"), ) ) ) .values_list("pk")
Example project for reproduction: https://github.com/salomvary/django-5-filter-regression
The issue is also present in 5.1.x and 5.2.x.
Change History (7)
comment:1 by , 5 weeks ago
Type: | Uncategorized → Bug |
---|
comment:2 by , 5 weeks ago
comment:3 by , 5 weeks ago
Cc: | added |
---|---|
Keywords: | chained-filters added; queryset filter regression removed |
Resolution: | → invalid |
Status: | new → closed |
Type: | Bug → Cleanup/optimization |
Hello Márton Salomváry, thank you for your report, as well as for the test project and bisected revision number.
In general, Django does not make guarantees about the exact SQL that is generated. This means SQL can change from version to version, and we would not consider that a bug as long as the semantics of the query remain the same (which, based on my testing, is the case here). The resulting queryset still returns the correct values.
That said, we do care about potential performance impacts and/or regressions. When triaging, I focused on how this change affects query planning, and I can confirm that the query plan is different/worse (see EXPLAIN ANALYZE results for Postgres comparing 4.2.x
vs main
). I do wonder, however, whether the way the queries are currently written might be more complex than necessary, which could contribute to these differences in execution plans. For example, this alternative queryset is, IMHO, equivalent and much more straightfoward:
Alpha.objects.filter( Q(bravos__charlie_bravos__charlie=9999) & Exists( Delta.objects.filter( bravos__id=OuterRef("bravos__charlie_bravos__bravo_id") ) ) )
Because of the above, I'll close the ticket as invalid
. If you encounter a case where a query written in a way that aligns with Django's ORM patterns (as shown above) returns incorrect results or performs poorly, please feel free to reopen it with details. Thanks again!
Appendix
EXPLAIN ANALYZE
for stable/4.2.x
:
Nested Loop (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.013 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name Inner Unique: true -> Nested Loop (cost=7.64..41.29 rows=6 width=4) (actual time=0.009..0.011 rows=1 loops=1) Output: test_filter_regression_bravo_alphas.alpha_id -> Nested Loop (cost=7.49..40.26 rows=2 width=12) (actual time=0.008..0.010 rows=1 loops=1) Output: test_filter_regression_bravo.id, u1.bravo_id, test_filter_regression_charliebravo.bravo_id Inner Unique: true Join Filter: (test_filter_regression_bravo.id = test_filter_regression_charliebravo.bravo_id) -> Nested Loop Semi Join (cost=7.22..39.05 rows=2 width=8) (actual time=0.007..0.008 rows=1 loops=1) Output: test_filter_regression_charliebravo.bravo_id, u1.bravo_id -> Bitmap Heap Scan on public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3 width=4) (actual time=0.003..0.003 rows=2 loops=1) Output: test_filter_regression_charliebravo.id, test_filter_regression_charliebravo.name, test_filter_regression_charliebravo.charlie_id, test_filter_regression_charliebravo.bravo_id Recheck Cond: (test_filter_regression_charliebravo.charlie_id = 9999) Heap Blocks: exact=1 -> Bitmap Index Scan on test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1) Index Cond: (test_filter_regression_charliebravo.charlie_id = 9999) -> Nested Loop (cost=3.05..13.34 rows=540 width=4) (actual time=0.002..0.002 rows=0 loops=2) Output: u1.bravo_id Inner Unique: true -> Bitmap Heap Scan on public.test_filter_regression_delta_bravos u1 (cost=2.90..11.43 rows=10 width=8) (actual time=0.001..0.001 rows=0 loops=2) Output: u1.id, u1.delta_id, u1.bravo_id Recheck Cond: (test_filter_regression_charliebravo.bravo_id = u1.bravo_id) Heap Blocks: exact=1 -> Bitmap Index Scan on test_filter_regression_delta_bravos_bravo_id_5b518e55 (cost=0.00..2.89 rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2) Index Cond: (u1.bravo_id = test_filter_regression_charliebravo.bravo_id) -> Index Only Scan using test_filter_regression_delta_pkey on public.test_filter_regression_delta u0 (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1) Output: u0.id Index Cond: (u0.id = u1.delta_id) Heap Fetches: 1 -> Index Only Scan using test_filter_regression_bravo_pkey on public.test_filter_regression_bravo (cost=0.28..0.59 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1) Output: test_filter_regression_bravo.id Index Cond: (test_filter_regression_bravo.id = u1.bravo_id) Heap Fetches: 1 -> Index Scan using test_filter_regression_bravo_alphas_bravo_id_042061e7 on public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10 width=8) (actual time=0.001..0.001 rows=1 loops=1) Output: test_filter_regression_bravo_alphas.id, test_filter_regression_bravo_alphas.bravo_id, test_filter_regression_bravo_alphas.alpha_id Index Cond: (test_filter_regression_bravo_alphas.bravo_id = test_filter_regression_bravo.id) -> Index Scan using test_filter_regression_alpha_pkey on public.test_filter_regression_alpha (cost=0.28..0.32 rows=1 width=122) (actual time=0.001..0.001 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name Index Cond: (test_filter_regression_alpha.id = test_filter_regression_bravo_alphas.alpha_id) Planning Time: 0.382 ms Execution Time: 0.027 ms
For main
:
Hash Join (cost=79.87..131.14 rows=20 width=122) (actual time=0.074..0.171 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name Inner Unique: true Hash Cond: (t6.bravo_id = u1.bravo_id) -> Nested Loop (cost=12.32..63.27 rows=40 width=134) (actual time=0.046..0.143 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, t6.bravo_id, t7.id, t8.bravo_id -> Nested Loop (cost=12.17..52.62 rows=42 width=130) (actual time=0.041..0.137 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, t6.bravo_id, t7.id Inner Unique: true -> Nested Loop (cost=11.90..39.10 rows=42 width=126) (actual time=0.034..0.130 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, t6.bravo_id Join Filter: (test_filter_regression_alpha.id = t6.alpha_id) -> Nested Loop (cost=11.74..33.86 rows=11 width=126) (actual time=0.030..0.126 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, test_filter_regression_bravo_alphas.alpha_id Inner Unique: true -> Nested Loop (cost=11.47..30.32 rows=11 width=4) (actual time=0.026..0.121 rows=1 loops=1) Output: test_filter_regression_bravo_alphas.alpha_id -> Hash Join (cost=11.32..28.77 rows=3 width=8) (actual time=0.018..0.112 rows=2 loops=1) Output: test_filter_regression_bravo.id, test_filter_regression_charliebravo.bravo_id Hash Cond: (test_filter_regression_bravo.id = test_filter_regression_charliebravo.bravo_id) -> Seq Scan on public.test_filter_regression_bravo (cost=0.00..15.40 rows=540 width=4) (actual time=0.003..0.051 rows=900 loops=1) Output: test_filter_regression_bravo.id, test_filter_regression_bravo.name -> Hash (cost=11.28..11.28 rows=3 width=4) (actual time=0.006..0.006 rows=2 loops=1) Output: test_filter_regression_charliebravo.bravo_id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Bitmap Heap Scan on public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3 width=4) (actual time=0.003..0.003 rows=2 loops=1) Output: test_filter_regression_charliebravo.bravo_id Recheck Cond: (test_filter_regression_charliebravo.charlie_id = 9999) Heap Blocks: exact=1 -> Bitmap Index Scan on test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1) Index Cond: (test_filter_regression_charliebravo.charlie_id = 9999) -> Index Scan using test_filter_regression_bravo_alphas_bravo_id_042061e7 on public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10 width=8) (actual time=0.004..0.004 rows=0 loops=2) Output: test_filter_regression_bravo_alphas.id, test_filter_regression_bravo_alphas.bravo_id, test_filter_regression_bravo_alphas.alpha_id Index Cond: (test_filter_regression_bravo_alphas.bravo_id = test_filter_regression_bravo.id) -> Index Scan using test_filter_regression_alpha_pkey on public.test_filter_regression_alpha (cost=0.28..0.32 rows=1 width=122) (actual time=0.004..0.004 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name Index Cond: (test_filter_regression_alpha.id = test_filter_regression_bravo_alphas.alpha_id) -> Index Scan using test_filter_regression_bravo_alphas_alpha_id_b2c31687 on public.test_filter_regression_bravo_alphas t6 (cost=0.15..0.35 rows=10 width=8) (actual time=0.004..0.004 rows=1 loops=1) Output: t6.id, t6.bravo_id, t6.alpha_id Index Cond: (t6.alpha_id = test_filter_regression_bravo_alphas.alpha_id) -> Index Only Scan using test_filter_regression_bravo_pkey on public.test_filter_regression_bravo t7 (cost=0.28..0.32 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1) Output: t7.id Index Cond: (t7.id = t6.bravo_id) Heap Fetches: 1 -> Index Only Scan using test_filter_regression_charliebravo_bravo_id_1b159238 on public.test_filter_regression_charliebravo t8 (cost=0.15..0.22 rows=3 width=4) (actual time=0.005..0.005 rows=1 loops=1) Output: t8.bravo_id Index Cond: (t8.bravo_id = t6.bravo_id) Heap Fetches: 1 -> Hash (cost=65.05..65.05 rows=200 width=4) (actual time=0.020..0.021 rows=2 loops=1) Output: u1.bravo_id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> HashAggregate (cost=63.05..65.05 rows=200 width=4) (actual time=0.017..0.017 rows=2 loops=1) Output: u1.bravo_id Group Key: u1.bravo_id Batches: 1 Memory Usage: 40kB -> Hash Join (cost=22.15..57.95 rows=2040 width=4) (actual time=0.015..0.016 rows=2 loops=1) Output: u1.bravo_id Inner Unique: true Hash Cond: (u1.delta_id = u0.id) -> Seq Scan on public.test_filter_regression_delta_bravos u1 (cost=0.00..30.40 rows=2040 width=8) (actual time=0.001..0.001 rows=2 loops=1) Output: u1.id, u1.delta_id, u1.bravo_id -> Hash (cost=15.40..15.40 rows=540 width=4) (actual time=0.004..0.004 rows=2 loops=1) Output: u0.id Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on public.test_filter_regression_delta u0 (cost=0.00..15.40 rows=540 width=4) (actual time=0.001..0.001 rows=2 loops=1) Output: u0.id Planning Time: 2.462 ms Execution Time: 0.238 ms
For main
with optimized queryset:
Nested Loop (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.012 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name Inner Unique: true -> Nested Loop (cost=7.64..41.29 rows=6 width=4) (actual time=0.009..0.011 rows=1 loops=1) Output: test_filter_regression_bravo_alphas.alpha_id -> Nested Loop (cost=7.49..40.26 rows=2 width=12) (actual time=0.008..0.010 rows=1 loops=1) Output: test_filter_regression_bravo.id, u1.bravo_id, test_filter_regression_charliebravo.bravo_id Inner Unique: true Join Filter: (test_filter_regression_bravo.id = test_filter_regression_charliebravo.bravo_id) -> Nested Loop Semi Join (cost=7.22..39.05 rows=2 width=8) (actual time=0.006..0.008 rows=1 loops=1) Output: test_filter_regression_charliebravo.bravo_id, u1.bravo_id -> Bitmap Heap Scan on public.test_filter_regression_charliebravo (cost=4.17..11.28 rows=3 width=4) (actual time=0.003..0.003 rows=2 loops=1) Output: test_filter_regression_charliebravo.id, test_filter_regression_charliebravo.name, test_filter_regression_charliebravo.charlie_id, test_filter_regression_charliebravo.bravo_id Recheck Cond: (test_filter_regression_charliebravo.charlie_id = 9999) Heap Blocks: exact=1 -> Bitmap Index Scan on test_filter_regression_charliebravo_charlie_id_1f151049 (cost=0.00..4.17 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1) Index Cond: (test_filter_regression_charliebravo.charlie_id = 9999) -> Nested Loop (cost=3.05..13.34 rows=540 width=4) (actual time=0.002..0.002 rows=0 loops=2) Output: u1.bravo_id Inner Unique: true -> Bitmap Heap Scan on public.test_filter_regression_delta_bravos u1 (cost=2.90..11.43 rows=10 width=8) (actual time=0.001..0.001 rows=0 loops=2) Output: u1.id, u1.delta_id, u1.bravo_id Recheck Cond: (test_filter_regression_charliebravo.bravo_id = u1.bravo_id) Heap Blocks: exact=1 -> Bitmap Index Scan on test_filter_regression_delta_bravos_bravo_id_5b518e55 (cost=0.00..2.89 rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2) Index Cond: (u1.bravo_id = test_filter_regression_charliebravo.bravo_id) -> Index Only Scan using test_filter_regression_delta_pkey on public.test_filter_regression_delta u0 (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1) Output: u0.id Index Cond: (u0.id = u1.delta_id) Heap Fetches: 1 -> Index Only Scan using test_filter_regression_bravo_pkey on public.test_filter_regression_bravo (cost=0.28..0.59 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1) Output: test_filter_regression_bravo.id Index Cond: (test_filter_regression_bravo.id = u1.bravo_id) Heap Fetches: 1 -> Index Scan using test_filter_regression_bravo_alphas_bravo_id_042061e7 on public.test_filter_regression_bravo_alphas (cost=0.15..0.42 rows=10 width=8) (actual time=0.001..0.001 rows=1 loops=1) Output: test_filter_regression_bravo_alphas.id, test_filter_regression_bravo_alphas.bravo_id, test_filter_regression_bravo_alphas.alpha_id Index Cond: (test_filter_regression_bravo_alphas.bravo_id = test_filter_regression_bravo.id) -> Index Scan using test_filter_regression_alpha_pkey on public.test_filter_regression_alpha (cost=0.28..0.32 rows=1 width=122) (actual time=0.001..0.001 rows=1 loops=1) Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name Index Cond: (test_filter_regression_alpha.id = test_filter_regression_bravo_alphas.alpha_id) Planning Time: 0.381 ms Execution Time: 0.027 ms
follow-up: 5 comment:4 by , 5 weeks ago
Hi Natalia,
In general, Django does not make guarantees about the exact SQL that is generated. This means SQL can change from version to version, and we would not consider that a bug as long as the semantics of the query remain the same (which, based on my testing, is the case here).
I think the semantics did change, maybe the example I provided is not obvious enough to showcase that. Will provide some more cases where we had to change code.
That said, we do care about potential performance impacts and/or regressions.
In our case the performance impact was extreme. We got from queries that take a few dozens of milliseconds to ones that practically never complete and had to kill queries manually in order to recover the PostgreSQL database server.
I do wonder, however, whether the way the queries are currently written might be more complex than necessary, which could contribute to these differences in execution plans.
They might indeed be written more complex than necessary, but they used to generate SQL we had no performance or other issues with.
I understand you are suggesting rewriting the queries before upgrading Django from 4.x to 5.0 but given the size and complexity of our code base, identifying the queries in a 100% reliable way is nearly impossible, meaning that we have to keep fighting runaway queries and rewriting until we no longer have production failures.
For example, this alternative queryset is, IMHO, equivalent and much more straightfoward:
Alpha.objects.filter( Q(bravos__charlie_bravos__charlie=9999) & Exists( Delta.objects.filter( bravos__id=OuterRef("bravos__charlie_bravos__bravo_id") ) ) )
Sure it might be more straightforward, but repeating filter()
calls is often handy if some is only applied conditionally, like this:
qs = qs.filter(some_unconditional_query...) if some_condition: qs = qs.filter(some_conditional_query...)
I also wonder why repeating filter()
and combining expressions like Q(...) & Q()
do not result in equivalent queries. Ie. shouldn't these three always be the same? If not, what is the difference in the semantics?
qs1 = qs.filter(foo=bar).filter(baz=qux) qs2 = qs.filter(Q(foo=bar) & Q(baz=qux)) qs3 = qs.filter(foo=bar, baz=qux)
If you encounter a case where a query written in a way that aligns with Django's ORM patterns (as shown above) returns incorrect results or performs poorly, please feel free to reopen it with details.
I would appreciate feedback on how the original query does not align with Django's ORM patterns. The documentation does not discourage repeat-calls to filter()
in order to narrow down results.
In any case, I will try to provide more input on how this breaks the results of our queries.
follow-up: 6 comment:5 by , 5 weeks ago
Replying to Márton Salomváry:
They might indeed be written more complex than necessary, but they used to generate SQL we had no performance or other issues with.
I understand you are suggesting rewriting the queries before upgrading Django from 4.x to 5.0 but given the size and complexity of our code base, identifying the queries in a 100% reliable way is nearly impossible, meaning that we have to keep fighting runaway queries and rewriting until we no longer have production failures.
The thing is that the commit you reference fixes a bug. Using your examples:
Alpha <--M2M--> Bravo <--FK-- CharlieBravo --FK--> Charlie Delta <--M2M--> Bravo
- Django
4.2x
:Alpha --> Bravo --> CharlieBravo
(single INNER JOIN)
- Since
5.0.x
:Alpha --> Bravo --> CharlieBravo
(INNER JOIN for first filter)Alpha --> T6 Bravo --> T7 CharlieBravo
(LEFT OUTER JOIN path for second filter)
[...] repeating
filter()
calls is often handy if some is only applied conditionally, like this:
qs = qs.filter(some_unconditional_query...) if some_condition: qs = qs.filter(some_conditional_query...)
As a side note, this example of chaining multiple .filter()
calls is common and supported, but if you need to ensure joins are combined as efficiently as possible, the equivalent pattern is to build up a list of conditions and combine them with &
into a single filter
, for example:
filters = [Q(some_unconditional_query)] if some_condition: filters.append(Q(some_conditional_query)) result = qs.filter(reduce(operator.and_, filters))
This is functionally equivalent to chaining, but ensures the ORM generates queries consistent with the documented behavior for multi-valued relationships.
I also wonder why repeating
filter()
and combining expressions likeQ(...) & Q()
do not result in equivalent queries. Ie. shouldn't these three always be the same? If not, what is the difference in the semantics?
This is documented in this section.
In fact, prior to the change in Django 5.0, usage of expressions passed to .filter()
sometimes contradicted the documentation around multi-valued relationships. The docs have long stated that when spanning multi-valued relationships, repeated filters can lead to additional joins, and that conditions should be combined in a single .filter()
if you want to avoid that. The recent change aligns behavior with that documented expectation, even if it changes some queries that happened to work differently before.
I would appreciate feedback on how the original query does not align with Django's ORM patterns. The documentation does not discourage repeat-calls to
filter()
in order to narrow down results.
Chaining .filter()
is a normal pattern, but when multi-valued relationships are involved the docs note that it can change semantics and have query/performance implications. If JOIN
reuse is necessary due to performance requirements, combining conditions in a single call is the ORM pattern to follow.
In any case, I will try to provide more input on how this breaks the results of our queries.
That would certainly help, thank you!
comment:6 by , 5 weeks ago
Replying to Natalia Bidart:
The thing is that the commit you reference fixes a bug.
I understand that. What I can't decide is whether what I am seeing is another bug introduced by that bugfix, or I have been successfully relying on the existence of bug for quite long :)
This is documented in this section.
I would appreciate feedback on how the original query does not align with Django's ORM patterns. The documentation does not discourage repeat-calls to
filter()
in order to narrow down results.
Chaining
.filter()
is a normal pattern, but when multi-valued relationships are involved the docs note that it can change semantics and have query/performance implications. IfJOIN
reuse is necessary due to performance requirements, combining conditions in a single call is the ORM pattern to follow.
Oh thanks, now I remember having been bitten by this in the past, but not often and painfully enough to remember the peculiarities of joins with multi-valued relationships!
In any case, I will try to provide more input on how this breaks the results of our queries.
That would certainly help, thank you!
In this example, count is wrongly 2
in Django 5.0 and correctly 1
in Django 4.x:
alpha = Alpha.objects.create(id=7777) bravo = alpha.bravos.create() charlie1 = Charlie.objects.create(id=9999) charlie2 = Charlie.objects.create(id=5555) CharlieBravo.objects.create(bravo=bravo, charlie=charlie1) CharlieBravo.objects.create(bravo=bravo, charlie=charlie2) bravo.deltas.create() queryset = ( Alpha.objects.filter( bravos__charlie_bravos__charlie=9999, ) .filter( Exists( Delta.objects.filter( bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"), ) ) ) .annotate(count=Count( "bravos__charlie_bravos__bravo" )) .values_list("pk", "count") ) self.assertEqual( [(7777, 1)], list(queryset) )
Now I wonder how to spot these in an existing code base, and prevent shooting myself in the foot in the future. I guess I can monkey-patch QuerySets to log a warning if repeated filter()
is used with multi-valued relationships and force using a single filter()
and combining conditions with &
or |
.
comment:7 by , 4 weeks ago
What I can't decide is whether what I am seeing is another bug introduced by that bugfix, or I have been successfully relying on the existence of bug for quite long :)
I can confirm that when support for conditional expressions passing to filter
was added in #25367 the documented chaining property to avoid join reuse was missed (look at the tests added in 4137fc2efce2dde48340728b8006fc6d66b9e3a5). This makes me believe the bug has been present since 3.0 which is when the feature was introduced.
In this example, count is wrongly 2 in Django 5.0 and correctly 1 in Django 4.x:
This is expected as in Django 5.0 the multi-valued relationship is joined twice which triggers #10060.
I've git bisected using this test: https://github.com/salomvary/django/commit/cc2631043e8a2fbf1b6a1eed166bf1e9ef890bc4
The offending commit seems to be this: https://github.com/django/django/commit/0e1aae7a5f51408b73c5a29e18bd1803dd030930