Opened 5 weeks ago

Closed 5 weeks ago

Last modified 4 weeks ago

#36552 closed Cleanup/optimization (invalid)

Repeating QuerySet.filter() generates unwanted LEFT OUTER JOINs

Reported by: Márton Salomváry Owned by:
Component: Database layer (models, ORM) Version: 5.0
Severity: Normal Keywords: chained-filters
Cc: Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

We stumbled upon this issue while upgrading a large code base from Django 4.x to 5.0.

A reasonably simplified (probably not the most trivial) reproduction case looks like this:

Alpha.objects.filter(
                bravos__charlie_bravos__charlie=9999,
            )
            .filter(
                Exists(
                    Delta.objects.filter(
                        bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"),
                    )
                )
            )
            .values_list("pk")

In Django 4 the generated query is:

            SELECT "example_alpha"."id"
            FROM "example_alpha"
            INNER JOIN "example_bravo_alphas" ON ("example_alpha"."id" = "example_bravo_alphas"."alpha_id")
            INNER JOIN "example_bravo" ON ("example_bravo_alphas"."bravo_id" = "example_bravo"."id")
            INNER JOIN "example_charliebravo" ON ("example_bravo"."id" = "example_charliebravo"."bravo_id")
            WHERE ("example_charliebravo"."charlie_id" = 9999
                   AND EXISTS
                     (SELECT 1 AS "a"
                      FROM "example_delta" U0
                      INNER JOIN "example_delta_bravos" U1 ON (U0."id" = U1."delta_id")
                      WHERE U1."bravo_id" = ("example_charliebravo"."bravo_id")
                      LIMIT 1))

In Django 5:

SELECT "example_alpha"."id"
FROM "example_alpha"
INNER JOIN "example_bravo_alphas" ON ("example_alpha"."id" = "example_bravo_alphas"."alpha_id")
INNER JOIN "example_bravo" ON ("example_bravo_alphas"."bravo_id" = "example_bravo"."id")
INNER JOIN "example_charliebravo" ON ("example_bravo"."id" = "example_charliebravo"."bravo_id")
LEFT OUTER JOIN "example_bravo_alphas" T6 ON ("example_alpha"."id" = T6."alpha_id")
LEFT OUTER JOIN "example_bravo" T7 ON (T6."bravo_id" = T7."id")
LEFT OUTER JOIN "example_charliebravo" T8 ON (T7."id" = T8."bravo_id")
WHERE ("example_charliebravo"."charlie_id" = 9999
       AND EXISTS
         (SELECT 1 AS "a"
          FROM "example_delta" U0
          INNER JOIN "example_delta_bravos" U1 ON (U0."id" = U1."delta_id")
          WHERE U1."bravo_id" = (T8."bravo_id")
          LIMIT 1))

Interestingly, this QuerySet generates the expected query both in 4.x and 5.0:

Alpha.objects
            .filter(
                Q(bravos__charlie_bravos__charlie=9999)
                &
                Exists(
                    Delta.objects.filter(
                        bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"),
                    )
                )
            )
            .values_list("pk")

Example project for reproduction: https://github.com/salomvary/django-5-filter-regression

The issue is also present in 5.1.x and 5.2.x.

Change History (7)

comment:1 by Márton Salomváry, 5 weeks ago

Type: UncategorizedBug

comment:3 by Natalia Bidart, 5 weeks ago

Cc: Simon Charette added
Keywords: chained-filters added; queryset filter regression removed
Resolution: invalid
Status: newclosed
Type: BugCleanup/optimization

Hello Márton Salomváry, thank you for your report, as well as for the test project and bisected revision number.

In general, Django does not make guarantees about the exact SQL that is generated. This means SQL can change from version to version, and we would not consider that a bug as long as the semantics of the query remain the same (which, based on my testing, is the case here). The resulting queryset still returns the correct values.

That said, we do care about potential performance impacts and/or regressions. When triaging, I focused on how this change affects query planning, and I can confirm that the query plan is different/worse (see EXPLAIN ANALYZE results for Postgres comparing 4.2.x vs main). I do wonder, however, whether the way the queries are currently written might be more complex than necessary, which could contribute to these differences in execution plans. For example, this alternative queryset is, IMHO, equivalent and much more straightfoward:

Alpha.objects.filter(
    Q(bravos__charlie_bravos__charlie=9999)
    & Exists(
        Delta.objects.filter(
            bravos__id=OuterRef("bravos__charlie_bravos__bravo_id")
        )
    )
)

Because of the above, I'll close the ticket as invalid. If you encounter a case where a query written in a way that aligns with Django's ORM patterns (as shown above) returns incorrect results or performs poorly, please feel free to reopen it with details. Thanks again!

Appendix

EXPLAIN ANALYZE for stable/4.2.x:

Nested Loop  (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.013 rows=1 loops=1)
  Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name
  Inner Unique: true
  ->  Nested Loop  (cost=7.64..41.29 rows=6 width=4) (actual time=0.009..0.011 rows=1 loops=1)
        Output: test_filter_regression_bravo_alphas.alpha_id
        ->  Nested Loop  (cost=7.49..40.26 rows=2 width=12) (actual time=0.008..0.010 rows=1 loops=1)
              Output: test_filter_regression_bravo.id, u1.bravo_id, test_filter_regression_charliebravo.bravo_id
              Inner Unique: true
              Join Filter: (test_filter_regression_bravo.id = test_filter_regression_charliebravo.bravo_id)
              ->  Nested Loop Semi Join  (cost=7.22..39.05 rows=2 width=8) (actual time=0.007..0.008 rows=1 loops=1)
                    Output: test_filter_regression_charliebravo.bravo_id, u1.bravo_id
                    ->  Bitmap Heap Scan on public.test_filter_regression_charliebravo  (cost=4.17..11.28 rows=3 width=4) (actual time=0.003..0.003 rows=2 loops=1)
                          Output: test_filter_regression_charliebravo.id, test_filter_regression_charliebravo.name, test_filter_regression_charliebravo.charlie_id, test_filter_regression_charliebravo.bravo_id
                          Recheck Cond: (test_filter_regression_charliebravo.charlie_id = 9999)
                          Heap Blocks: exact=1
                          ->  Bitmap Index Scan on test_filter_regression_charliebravo_charlie_id_1f151049  (cost=0.00..4.17 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
                                Index Cond: (test_filter_regression_charliebravo.charlie_id = 9999)
                    ->  Nested Loop  (cost=3.05..13.34 rows=540 width=4) (actual time=0.002..0.002 rows=0 loops=2)
                          Output: u1.bravo_id
                          Inner Unique: true
                          ->  Bitmap Heap Scan on public.test_filter_regression_delta_bravos u1  (cost=2.90..11.43 rows=10 width=8) (actual time=0.001..0.001 rows=0 loops=2)
                                Output: u1.id, u1.delta_id, u1.bravo_id
                                Recheck Cond: (test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
                                Heap Blocks: exact=1
                                ->  Bitmap Index Scan on test_filter_regression_delta_bravos_bravo_id_5b518e55  (cost=0.00..2.89 rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
                                      Index Cond: (u1.bravo_id = test_filter_regression_charliebravo.bravo_id)
                          ->  Index Only Scan using test_filter_regression_delta_pkey on public.test_filter_regression_delta u0  (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
                                Output: u0.id
                                Index Cond: (u0.id = u1.delta_id)
                                Heap Fetches: 1
              ->  Index Only Scan using test_filter_regression_bravo_pkey on public.test_filter_regression_bravo  (cost=0.28..0.59 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
                    Output: test_filter_regression_bravo.id
                    Index Cond: (test_filter_regression_bravo.id = u1.bravo_id)
                    Heap Fetches: 1
        ->  Index Scan using test_filter_regression_bravo_alphas_bravo_id_042061e7 on public.test_filter_regression_bravo_alphas  (cost=0.15..0.42 rows=10 width=8) (actual time=0.001..0.001 rows=1 loops=1)
              Output: test_filter_regression_bravo_alphas.id, test_filter_regression_bravo_alphas.bravo_id, test_filter_regression_bravo_alphas.alpha_id
              Index Cond: (test_filter_regression_bravo_alphas.bravo_id = test_filter_regression_bravo.id)
  ->  Index Scan using test_filter_regression_alpha_pkey on public.test_filter_regression_alpha  (cost=0.28..0.32 rows=1 width=122) (actual time=0.001..0.001 rows=1 loops=1)
        Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name
        Index Cond: (test_filter_regression_alpha.id = test_filter_regression_bravo_alphas.alpha_id)
Planning Time: 0.382 ms
Execution Time: 0.027 ms

For main:

Hash Join  (cost=79.87..131.14 rows=20 width=122) (actual time=0.074..0.171 rows=1 loops=1)
  Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name
  Inner Unique: true
  Hash Cond: (t6.bravo_id = u1.bravo_id)
  ->  Nested Loop  (cost=12.32..63.27 rows=40 width=134) (actual time=0.046..0.143 rows=1 loops=1)
        Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, t6.bravo_id, t7.id, t8.bravo_id
        ->  Nested Loop  (cost=12.17..52.62 rows=42 width=130) (actual time=0.041..0.137 rows=1 loops=1)
              Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, t6.bravo_id, t7.id
              Inner Unique: true
              ->  Nested Loop  (cost=11.90..39.10 rows=42 width=126) (actual time=0.034..0.130 rows=1 loops=1)
                    Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, t6.bravo_id
                    Join Filter: (test_filter_regression_alpha.id = t6.alpha_id)
                    ->  Nested Loop  (cost=11.74..33.86 rows=11 width=126) (actual time=0.030..0.126 rows=1 loops=1)
                          Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name, test_filter_regression_bravo_alphas.alpha_id
                          Inner Unique: true
                          ->  Nested Loop  (cost=11.47..30.32 rows=11 width=4) (actual time=0.026..0.121 rows=1 loops=1)
                                Output: test_filter_regression_bravo_alphas.alpha_id
                                ->  Hash Join  (cost=11.32..28.77 rows=3 width=8) (actual time=0.018..0.112 rows=2 loops=1)
                                      Output: test_filter_regression_bravo.id, test_filter_regression_charliebravo.bravo_id
                                      Hash Cond: (test_filter_regression_bravo.id = test_filter_regression_charliebravo.bravo_id)
                                      ->  Seq Scan on public.test_filter_regression_bravo  (cost=0.00..15.40 rows=540 width=4) (actual time=0.003..0.051 rows=900 loops=1)
                                            Output: test_filter_regression_bravo.id, test_filter_regression_bravo.name
                                      ->  Hash  (cost=11.28..11.28 rows=3 width=4) (actual time=0.006..0.006 rows=2 loops=1)
                                            Output: test_filter_regression_charliebravo.bravo_id
                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                            ->  Bitmap Heap Scan on public.test_filter_regression_charliebravo  (cost=4.17..11.28 rows=3 width=4) (actual time=0.003..0.003 rows=2 loops=1)
                                                  Output: test_filter_regression_charliebravo.bravo_id
                                                  Recheck Cond: (test_filter_regression_charliebravo.charlie_id = 9999)
                                                  Heap Blocks: exact=1
                                                  ->  Bitmap Index Scan on test_filter_regression_charliebravo_charlie_id_1f151049  (cost=0.00..4.17 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
                                                        Index Cond: (test_filter_regression_charliebravo.charlie_id = 9999)
                                ->  Index Scan using test_filter_regression_bravo_alphas_bravo_id_042061e7 on public.test_filter_regression_bravo_alphas  (cost=0.15..0.42 rows=10 width=8) (actual time=0.004..0.004 rows=0 loops=2)
                                      Output: test_filter_regression_bravo_alphas.id, test_filter_regression_bravo_alphas.bravo_id, test_filter_regression_bravo_alphas.alpha_id
                                      Index Cond: (test_filter_regression_bravo_alphas.bravo_id = test_filter_regression_bravo.id)
                          ->  Index Scan using test_filter_regression_alpha_pkey on public.test_filter_regression_alpha  (cost=0.28..0.32 rows=1 width=122) (actual time=0.004..0.004 rows=1 loops=1)
                                Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name
                                Index Cond: (test_filter_regression_alpha.id = test_filter_regression_bravo_alphas.alpha_id)
                    ->  Index Scan using test_filter_regression_bravo_alphas_alpha_id_b2c31687 on public.test_filter_regression_bravo_alphas t6  (cost=0.15..0.35 rows=10 width=8) (actual time=0.004..0.004 rows=1 loops=1)
                          Output: t6.id, t6.bravo_id, t6.alpha_id
                          Index Cond: (t6.alpha_id = test_filter_regression_bravo_alphas.alpha_id)
              ->  Index Only Scan using test_filter_regression_bravo_pkey on public.test_filter_regression_bravo t7  (cost=0.28..0.32 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
                    Output: t7.id
                    Index Cond: (t7.id = t6.bravo_id)
                    Heap Fetches: 1
        ->  Index Only Scan using test_filter_regression_charliebravo_bravo_id_1b159238 on public.test_filter_regression_charliebravo t8  (cost=0.15..0.22 rows=3 width=4) (actual time=0.005..0.005 rows=1 loops=1)
              Output: t8.bravo_id
              Index Cond: (t8.bravo_id = t6.bravo_id)
              Heap Fetches: 1
  ->  Hash  (cost=65.05..65.05 rows=200 width=4) (actual time=0.020..0.021 rows=2 loops=1)
        Output: u1.bravo_id
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  HashAggregate  (cost=63.05..65.05 rows=200 width=4) (actual time=0.017..0.017 rows=2 loops=1)
              Output: u1.bravo_id
              Group Key: u1.bravo_id
              Batches: 1  Memory Usage: 40kB
              ->  Hash Join  (cost=22.15..57.95 rows=2040 width=4) (actual time=0.015..0.016 rows=2 loops=1)
                    Output: u1.bravo_id
                    Inner Unique: true
                    Hash Cond: (u1.delta_id = u0.id)
                    ->  Seq Scan on public.test_filter_regression_delta_bravos u1  (cost=0.00..30.40 rows=2040 width=8) (actual time=0.001..0.001 rows=2 loops=1)
                          Output: u1.id, u1.delta_id, u1.bravo_id
                    ->  Hash  (cost=15.40..15.40 rows=540 width=4) (actual time=0.004..0.004 rows=2 loops=1)
                          Output: u0.id
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on public.test_filter_regression_delta u0  (cost=0.00..15.40 rows=540 width=4) (actual time=0.001..0.001 rows=2 loops=1)
                                Output: u0.id
Planning Time: 2.462 ms
Execution Time: 0.238 ms

For main with optimized queryset:

Nested Loop  (cost=7.92..43.22 rows=6 width=122) (actual time=0.010..0.012 rows=1 loops=1)
  Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name
  Inner Unique: true
  ->  Nested Loop  (cost=7.64..41.29 rows=6 width=4) (actual time=0.009..0.011 rows=1 loops=1)
        Output: test_filter_regression_bravo_alphas.alpha_id
        ->  Nested Loop  (cost=7.49..40.26 rows=2 width=12) (actual time=0.008..0.010 rows=1 loops=1)
              Output: test_filter_regression_bravo.id, u1.bravo_id, test_filter_regression_charliebravo.bravo_id
              Inner Unique: true
              Join Filter: (test_filter_regression_bravo.id = test_filter_regression_charliebravo.bravo_id)
              ->  Nested Loop Semi Join  (cost=7.22..39.05 rows=2 width=8) (actual time=0.006..0.008 rows=1 loops=1)
                    Output: test_filter_regression_charliebravo.bravo_id, u1.bravo_id
                    ->  Bitmap Heap Scan on public.test_filter_regression_charliebravo  (cost=4.17..11.28 rows=3 width=4) (actual time=0.003..0.003 rows=2 loops=1)
                          Output: test_filter_regression_charliebravo.id, test_filter_regression_charliebravo.name, test_filter_regression_charliebravo.charlie_id, test_filter_regression_charliebravo.bravo_id
                          Recheck Cond: (test_filter_regression_charliebravo.charlie_id = 9999)
                          Heap Blocks: exact=1
                          ->  Bitmap Index Scan on test_filter_regression_charliebravo_charlie_id_1f151049  (cost=0.00..4.17 rows=3 width=0) (actual time=0.002..0.002 rows=2 loops=1)
                                Index Cond: (test_filter_regression_charliebravo.charlie_id = 9999)
                    ->  Nested Loop  (cost=3.05..13.34 rows=540 width=4) (actual time=0.002..0.002 rows=0 loops=2)
                          Output: u1.bravo_id
                          Inner Unique: true
                          ->  Bitmap Heap Scan on public.test_filter_regression_delta_bravos u1  (cost=2.90..11.43 rows=10 width=8) (actual time=0.001..0.001 rows=0 loops=2)
                                Output: u1.id, u1.delta_id, u1.bravo_id
                                Recheck Cond: (test_filter_regression_charliebravo.bravo_id = u1.bravo_id)
                                Heap Blocks: exact=1
                                ->  Bitmap Index Scan on test_filter_regression_delta_bravos_bravo_id_5b518e55  (cost=0.00..2.89 rows=10 width=0) (actual time=0.000..0.000 rows=0 loops=2)
                                      Index Cond: (u1.bravo_id = test_filter_regression_charliebravo.bravo_id)
                          ->  Index Only Scan using test_filter_regression_delta_pkey on public.test_filter_regression_delta u0  (cost=0.15..0.19 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
                                Output: u0.id
                                Index Cond: (u0.id = u1.delta_id)
                                Heap Fetches: 1
              ->  Index Only Scan using test_filter_regression_bravo_pkey on public.test_filter_regression_bravo  (cost=0.28..0.59 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1)
                    Output: test_filter_regression_bravo.id
                    Index Cond: (test_filter_regression_bravo.id = u1.bravo_id)
                    Heap Fetches: 1
        ->  Index Scan using test_filter_regression_bravo_alphas_bravo_id_042061e7 on public.test_filter_regression_bravo_alphas  (cost=0.15..0.42 rows=10 width=8) (actual time=0.001..0.001 rows=1 loops=1)
              Output: test_filter_regression_bravo_alphas.id, test_filter_regression_bravo_alphas.bravo_id, test_filter_regression_bravo_alphas.alpha_id
              Index Cond: (test_filter_regression_bravo_alphas.bravo_id = test_filter_regression_bravo.id)
  ->  Index Scan using test_filter_regression_alpha_pkey on public.test_filter_regression_alpha  (cost=0.28..0.32 rows=1 width=122) (actual time=0.001..0.001 rows=1 loops=1)
        Output: test_filter_regression_alpha.id, test_filter_regression_alpha.name
        Index Cond: (test_filter_regression_alpha.id = test_filter_regression_bravo_alphas.alpha_id)
Planning Time: 0.381 ms
Execution Time: 0.027 ms

comment:4 by Márton Salomváry, 5 weeks ago

Hi Natalia,

In general, Django does not make guarantees about the exact SQL that is generated. This means SQL can change from version to version, and we would not consider that a bug as long as the semantics of the query remain the same (which, based on my testing, is the case here).

I think the semantics did change, maybe the example I provided is not obvious enough to showcase that. Will provide some more cases where we had to change code.

That said, we do care about potential performance impacts and/or regressions.

In our case the performance impact was extreme. We got from queries that take a few dozens of milliseconds to ones that practically never complete and had to kill queries manually in order to recover the PostgreSQL database server.

I do wonder, however, whether the way the queries are currently written might be more complex than necessary, which could contribute to these differences in execution plans.

They might indeed be written more complex than necessary, but they used to generate SQL we had no performance or other issues with.

I understand you are suggesting rewriting the queries before upgrading Django from 4.x to 5.0 but given the size and complexity of our code base, identifying the queries in a 100% reliable way is nearly impossible, meaning that we have to keep fighting runaway queries and rewriting until we no longer have production failures.

For example, this alternative queryset is, IMHO, equivalent and much more straightfoward:


 Alpha.objects.filter(
    Q(bravos__charlie_bravos__charlie=9999)
    & Exists(
        Delta.objects.filter(
            bravos__id=OuterRef("bravos__charlie_bravos__bravo_id")
        )
    )
)

Sure it might be more straightforward, but repeating filter() calls is often handy if some is only applied conditionally, like this:

qs = qs.filter(some_unconditional_query...)

if some_condition:
  qs = qs.filter(some_conditional_query...)

I also wonder why repeating filter() and combining expressions like Q(...) & Q() do not result in equivalent queries. Ie. shouldn't these three always be the same? If not, what is the difference in the semantics?

qs1 = qs.filter(foo=bar).filter(baz=qux)

qs2 = qs.filter(Q(foo=bar) & Q(baz=qux))

qs3 = qs.filter(foo=bar, baz=qux)

If you encounter a case where a query written in a way that aligns with Django's ORM patterns (as shown above) returns incorrect results or performs poorly, please feel free to reopen it with details.

I would appreciate feedback on how the original query does not align with Django's ORM patterns. The documentation does not discourage repeat-calls to filter() in order to narrow down results.

In any case, I will try to provide more input on how this breaks the results of our queries.

in reply to:  4 ; comment:5 by Natalia Bidart, 5 weeks ago

Replying to Márton Salomváry:

They might indeed be written more complex than necessary, but they used to generate SQL we had no performance or other issues with.

I understand you are suggesting rewriting the queries before upgrading Django from 4.x to 5.0 but given the size and complexity of our code base, identifying the queries in a 100% reliable way is nearly impossible, meaning that we have to keep fighting runaway queries and rewriting until we no longer have production failures.

The thing is that the commit you reference fixes a bug. Using your examples:

Alpha <--M2M--> Bravo <--FK-- CharlieBravo --FK--> Charlie
Delta <--M2M--> Bravo
  • Django 4.2x: Alpha --> Bravo --> CharlieBravo (single INNER JOIN)
  • Since 5.0.x:
    • Alpha --> Bravo --> CharlieBravo (INNER JOIN for first filter)
    • Alpha --> T6 Bravo --> T7 CharlieBravo (LEFT OUTER JOIN path for second filter)

[...] repeating filter() calls is often handy if some is only applied conditionally, like this:

qs = qs.filter(some_unconditional_query...)

if some_condition:
  qs = qs.filter(some_conditional_query...)

As a side note, this example of chaining multiple .filter() calls is common and supported, but if you need to ensure joins are combined as efficiently as possible, the equivalent pattern is to build up a list of conditions and combine them with & into a single filter, for example:

filters = [Q(some_unconditional_query)]
if some_condition:
    filters.append(Q(some_conditional_query))
result = qs.filter(reduce(operator.and_, filters))

This is functionally equivalent to chaining, but ensures the ORM generates queries consistent with the documented behavior for multi-valued relationships.

I also wonder why repeating filter() and combining expressions like Q(...) & Q() do not result in equivalent queries. Ie. shouldn't these three always be the same? If not, what is the difference in the semantics?

This is documented in this section.

In fact, prior to the change in Django 5.0, usage of expressions passed to .filter() sometimes contradicted the documentation around multi-valued relationships. The docs have long stated that when spanning multi-valued relationships, repeated filters can lead to additional joins, and that conditions should be combined in a single .filter() if you want to avoid that. The recent change aligns behavior with that documented expectation, even if it changes some queries that happened to work differently before.

I would appreciate feedback on how the original query does not align with Django's ORM patterns. The documentation does not discourage repeat-calls to filter() in order to narrow down results.

Chaining .filter() is a normal pattern, but when multi-valued relationships are involved the docs note that it can change semantics and have query/performance implications. If JOIN reuse is necessary due to performance requirements, combining conditions in a single call is the ORM pattern to follow.

In any case, I will try to provide more input on how this breaks the results of our queries.

That would certainly help, thank you!

in reply to:  5 comment:6 by Márton Salomváry, 5 weeks ago

Replying to Natalia Bidart:

The thing is that the commit you reference fixes a bug.

I understand that. What I can't decide is whether what I am seeing is another bug introduced by that bugfix, or I have been successfully relying on the existence of bug for quite long :)

This is documented in this section.

I would appreciate feedback on how the original query does not align with Django's ORM patterns. The documentation does not discourage repeat-calls to filter() in order to narrow down results.

Chaining .filter() is a normal pattern, but when multi-valued relationships are involved the docs note that it can change semantics and have query/performance implications. If JOIN reuse is necessary due to performance requirements, combining conditions in a single call is the ORM pattern to follow.

Oh thanks, now I remember having been bitten by this in the past, but not often and painfully enough to remember the peculiarities of joins with multi-valued relationships!

In any case, I will try to provide more input on how this breaks the results of our queries.

That would certainly help, thank you!

In this example, count is wrongly 2 in Django 5.0 and correctly 1 in Django 4.x:

https://github.com/salomvary/django-5-filter-regression/commit/f3cabd4e8987210e9aa17c99bfa622cfc7c580d8

alpha = Alpha.objects.create(id=7777)
bravo = alpha.bravos.create()
charlie1 = Charlie.objects.create(id=9999)
charlie2 = Charlie.objects.create(id=5555)
CharlieBravo.objects.create(bravo=bravo, charlie=charlie1)
CharlieBravo.objects.create(bravo=bravo, charlie=charlie2)
bravo.deltas.create()

queryset = (
    Alpha.objects.filter(
        bravos__charlie_bravos__charlie=9999,
    )
    .filter(
        Exists(
            Delta.objects.filter(
                bravos__id=OuterRef("bravos__charlie_bravos__bravo_id"),
            )
        )
    )
    .annotate(count=Count(
        "bravos__charlie_bravos__bravo"
    ))
    .values_list("pk", "count")
)

self.assertEqual(
    [(7777, 1)],
    list(queryset)
)

Now I wonder how to spot these in an existing code base, and prevent shooting myself in the foot in the future. I guess I can monkey-patch QuerySets to log a warning if repeated filter() is used with multi-valued relationships and force using a single filter() and combining conditions with & or |.

comment:7 by Simon Charette, 4 weeks ago

What I can't decide is whether what I am seeing is another bug introduced by that bugfix, or I have been successfully relying on the existence of bug for quite long :)

I can confirm that when support for conditional expressions passing to filter was added in #25367 the documented chaining property to avoid join reuse was missed (look at the tests added in 4137fc2efce2dde48340728b8006fc6d66b9e3a5). This makes me believe the bug has been present since 3.0 which is when the feature was introduced.

In this example, count is wrongly 2 in Django 5.0 and correctly 1 in Django 4.x:

This is expected as in Django 5.0 the multi-valued relationship is joined twice which triggers #10060.

Note: See TracTickets for help on using tickets.
Back to Top