Opened 19 months ago
Last modified 10 months ago
#34597 new Cleanup/optimization
Queryset (split) exclude's usage of Exists is significantly slower than subquery
Reported by: | Lorand Varga | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.2 |
Severity: | Normal | Keywords: | |
Cc: | Simon Charette, David Sanders | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
My app has 2 models (making things simple so debugging is easier):
class Blog(models.Model): title = models.CharField(max_length=60, blank=True) is_published = models.BooleanField(default=generate_random_bool, db_index=True) class Translation(models.Model): blog = models.ForeignKey('Blog', on_delete=models.CASCADE, related_name='translation')
There is a reverse foreign key relation between translation and Blog. Also a blog can have multiple translations (or none).
I have this code block:
from django.db.models import Q qs = Blog.objects.filter(Q(is_published=True) & ~Q(translation=None)) print(qs.query)
django 2.2 ORM generates this output:
SELECT "blog"."id", "blog"."title", "blog"."is_published" FROM "blog" WHERE ( "blog"."is_published" = True AND NOT ( "blog"."id" IN ( SELECT U0."id" FROM "blog" U0 LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = ("blog"."id") ) ) ) )
For the same code block the django 3.2+ ORM (tried also the latest version of django 4.2.1) it generates:
SELECT "blog"."id", "blog"."title", "blog"."is_published" FROM "blog" WHERE ( "blog"."is_published" AND NOT ( EXISTS( SELECT (1) AS "a" FROM "blog" U0 LEFT OUTER JOIN "translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = "blog"."id" AND "blog"."id" = "blog"."id" ) LIMIT 1 ) ) )
which is a whole lot slower (at least on the same postgresql 13.7 instance).
The django 2.2 version executes in a couple of seconds while the newer version executes in half an hour.
I think (but am not completely sure, might be terribly wrong) that the issue was introduced here:
https://github.com/django/django/pull/13300/files
Environment:
asgiref==3.7.1
backports.zoneinfo==0.2.1
Django==3.2.19
model-bakery==1.11.0
psycopg2-binary==2.8.6
pytz==2023.3
sqlparse==0.4.4
typing_extensions==4.6.1
Db environment:
PostgreSQL 13.7 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit
Bug does not manifest in django 2.2.12.
Bug manifests in django 3.2 and django 4.2.
Change History (14)
comment:1 by , 19 months ago
Cc: | added |
---|---|
Resolution: | → needsinfo |
Severity: | Release blocker → Normal |
Status: | new → closed |
Type: | Bug → Cleanup/optimization |
Version: | 4.2 → 3.2 |
comment:2 by , 19 months ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
Adding more information:
Django 2.2 behavior:
EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" = True AND "test_app_blog"."api_has_translation" = False AND NOT ( "test_app_blog"."id" IN ( SELECT U0."id" FROM "test_app_blog" U0 LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = ("test_app_blog"."id") ) ) ) ) ORDER BY "test_app_blog"."updated_at" DESC Sort (cost=1047131.99..1047357.72 rows=90293 width=1985) (actual time=888.530..888.533 rows=4 loops=1) Sort Key: test_app_blog.updated_at DESC Sort Method: quicksort Memory: 33kB -> Index Scan using test_app_blog_is_published_4b47c652_uniq on test_app_blog (cost=0.42..995257.82 rows=90293 width=1985) (actual time=72.414..888.496 rows=4 loops=1) Index Cond: (is_published = true) Filter: ((NOT api_has_translation) AND (NOT (SubPlan 1))) Rows Removed by Filter: 191941 SubPlan 1 -> Nested Loop Left Join (cost=0.71..8.57 rows=1 width=4) (actual time=0.003..0.003 rows=1 loops=179902) Join Filter: (u0.id = u1.blog_id) Filter: (u1.id IS NULL) Rows Removed by Filter: 0 -> Index Only Scan using test_app_blog_pkey on test_app_blog u0 (cost=0.42..2.44 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=179902) Index Cond: (id = test_app_blog.id) Heap Fetches: 27765 -> Index Scan using test_app_translation_51c6d5db on test_app_translation u1 (cost=0.29..6.07 rows=5 width=8) (actual time=0.001..0.001 rows=0 loops=179902) Index Cond: (blog_id = test_app_blog.id) Planning Time: 0.500 ms Execution Time: 888.606 ms
Django 3.2 (and 4.2) behavior:
EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" AND NOT "test_app_blog"."api_has_translation" AND NOT ( EXISTS( SELECT (1) AS "a" FROM "test_app_blog" U0 LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = "test_app_blog"."id" AND "test_app_blog"."id" = "test_app_blog"."id" ) LIMIT 1 ) ) ) ORDER BY "test_app_blog"."updated_at" DESC Nested Loop Anti Join (cost=1001.15..131927.02 rows=180586 width=1985) (actual time=208264.409..3034435.256 rows=4 loops=1) Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id = test_app_blog.id)) Rows Removed by Join Filter: 16853800739 -> Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on test_app_blog (cost=0.42..107115.43 rows=180586 width=1985) (actual time=1.080..17454.750 rows=179902 loops=1) Filter: (is_published AND (NOT api_has_translation)) Rows Removed by Filter: 27456 -> Materialize (cost=1000.73..21651.35 rows=1 width=4) (actual time=0.002..9.009 rows=93684 loops=179902) -> Gather Merge (cost=1000.73..21651.34 rows=1 width=4) (actual time=6.024..192.937 rows=194102 loops=1) Workers Planned: 2 Workers Launched: 2 -> Merge Left Join (cost=0.71..20651.20 rows=1 width=4) (actual time=0.046..87.573 rows=64701 loops=3) Merge Cond: (u0.id = u1.blog_id) Filter: (u1.id IS NULL) Rows Removed by Filter: 19931 -> Parallel Index Only Scan using test_app_blog_pkey on test_app_blog u0 (cost=0.42..13961.37 rows=86361 width=4) (actual time=0.018..32.796 rows=69119 loops=3) Heap Fetches: 33004 -> Index Scan using test_app_translation_51c6d5db on test_app_translation u1 (cost=0.29..6082.91 rows=60175 width=8) (actual time=0.025..32.500 rows=58545 loops=3) Planning Time: 0.882 ms Execution Time: 3034436.784 ms
As can be seen, there is a huge increase in execution time.
I can't vouch there isn't something else going on here also but currently the only thing changed is the django version.
Anything else in the environment is left untouched. (And django2.2 generated query is performant, while django 3.2 query is awfull).
I've managed to use your advice and modify the query to force the subquery to use IN but that leads to synthax that is more complex, while the django 2.2 synthax is a lot simpler: ~Q(translation=None)
The only synthax that manages (in django 3.2) to replicate the query correctly (and performance wise as in django2.2) seems to be
qs.exclude(id__in=Blog.objects.filter(translation=None, id=OuterRef("id")))
While not terrible I do think something has been broken in django since the ~Q(translation=None) seems a lot cleaner and more ORMish.
Thanks,
Lorand.
comment:3 by , 19 months ago
Summary: | Django ORM query SQL generation changed (and is apparently a lot slower) → Queryset (split) exclude's usage of Exists is significantly slower than subquery |
---|
Thank you for providing query plans, this is helpful.
While it's undeniable that in this particular case the query is slower, Postgres wiki itself advise against using NOT IN so there is certainly something else to blame here.
The first thing that is suspicious to me is that the new query is using parallel workers which are know to be causing some slowdown when not appropriately configured. For example, if running the first query gets you close to your work_mem
then the second one will definitely will cause spills to disk and could explain the significant difference in execution duration. What I see from the plan is that Postgres believes that it will be faster to parallelize the query but when it tries to materialize the results returned by workers it takes ages.
Tuning this parameter is definitely out of scope for this ticket as it dips into user / DBA support territory but it'd be great to confirm that it's the reason you are encountering this issue.
I've managed to use your advice and modify the query to force the subquery to use IN but that leads to synthax that is more complex, while the django 2.2 synthax is a lot simpler: ~Q(translation=None)
While not terrible I do think something has been broken in django since the ~Q(translation=None) seems a lot cleaner and more ORMish.
I've never argue otherwise, it was solely meant as a way to unblock you from upgrading from a Django version that has been unsupported for months and that I assumed you were eager to move away from.
As for what ORMish means that is open for interpretation. I would argue that ~Q(translation=None)
is extremely ambiguous given how None
doesn't map 1:1 to SQL NULL semantics that the objective of an ORM is to abstract SQL operations to do the right thing most of the time while providing escape hatches for the remaining cases. In this particular case I've provided you a reasoning about why these changes were made so hopefully it's clear to you why we need substantial evidence before reverting these changes given they have already been part of two LTS without prior reports of performance degradation.
In order to determine what might the origin your issue I'd ask you to run the following queries while preceding them with a SET max_parallel_workers_per_gather = 0
EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" AND NOT "test_app_blog"."api_has_translation" AND NOT ( EXISTS( SELECT (1) AS "a" FROM "test_app_blog" U0 LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = "test_app_blog"."id" ) LIMIT 1 ) ) ) ORDER BY "test_app_blog"."updated_at" DESC
EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" AND NOT "test_app_blog"."api_has_translation" AND EXISTS( SELECT (1) AS "a" FROM "test_app_translation" U1 WHERE U1."blog_id" = "test_app_blog"."id" LIMIT 1 ) ) ) ORDER BY "test_app_blog"."updated_at" DESC
comment:4 by , 19 months ago
EDIT: Actually please disregard this, after some investigation I found the materialise is being forced by the conditions in the query, not the lack of memory.
The MATERIALIZE
in the expensive plan indicates that Postgres doesn't think it has enough memory to do something more efficient. (Sometimes Postgres will choose something less optimal in order to materialise.)
Try turning off materialisation to see what plan it uses:
SET enable_material = 'off';
If it's able to use a more efficient plan then you could try adjusting your memory settings/adding more RAM.
comment:5 by , 19 months ago
Cc: | added |
---|
comment:6 by , 19 months ago
Posting some investigation here for charettes & others:
Looks like the additional redundant condition in the inner query that references the outer query is causing unwanted materialisation:
AND "test_app_blog"."id" = "test_app_blog"."id"
When I removed this – no more materialisation.
Additionally: Not sure why this is being added to the query but if you remove Q(is_published=True)
so that you simply have Blog.objects.filter(~Q(translation=None))
then this redundant condition is no longer added.
I tested with the reporter's models on latest main & I'd wager this is something we can/may want to address.
comment:7 by , 19 months ago
Thanks for the investigation David!
The AND "blog"."id" = "blog"."id"
is definitely something we want to elide and was mentioned in comment:1 as a bug but the reporter said that they were also getting poor results with the provided ORM examples that explicitly don't have this criteria (e.g. ~Exists(Blog.objects.filter(translation=None, id=OuterRef("id"))
) so I suggest we wait to see how well the SQL examples of comment:3 (that also don't have this criteria) perform for the reporter.
In other words, the AND "blog"."id" = "blog"."id"
part is definitely a bug we want to solve but if we are going to address this issue it'd be great to confirm it's the only factor to blame.
comment:8 by , 19 months ago
but the reporter said that they were also getting poor results with the provided ORM examples that explicitly don't have this criteria (e.g.
~Exists(Blog.objects.filter(translation=None, id=OuterRef("id"))
)
Interesting I tried that and the plan was better, though I don't have any data to work with.
Am eager to hear what the results are.
comment:9 by , 19 months ago
Simon, I want to say that I really appreciate your support and patience on this. If I've been too annoying in my comments, it was not on purpose - I'm actually really curios if there is an underlying issue or not with django (since there are a lot of interactions going on). I'm not insisting for any revert unless needed and your arguments were solid and I'm very grateful for your support.
Getting back to the needy greedy details, turning off the workers does not improve anything.
The NOT(Exists):
SET max_parallel_workers_per_gather = 0 EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" AND NOT "test_app_blog"."api_has_translation" AND NOT ( EXISTS( SELECT (1) AS "a" FROM "test_app_blog" U0 LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = "test_app_blog"."id" ) LIMIT 1 ) ) ) ORDER BY "test_app_blog"."updated_at" DESC Nested Loop Anti Join (cost=1.13..132780.57 rows=180586 width=1985) (actual time=211403.273..3034890.176 rows=4 loops=1) Join Filter: ((test_app_blog.id = test_app_blog.id) AND (u0.id = test_app_blog.id)) Rows Removed by Join Filter: 16853800739 -> Index Scan Backward using test_app_blog_updated_at_34e74e5b_uniq on test_app_blog (cost=0.42..107115.43 rows=180586 width=1985) (actual time=1.080..1356.688 rows=179902 loops=1) Filter: (is_published AND (NOT api_has_translation)) Rows Removed by Filter: 27456 -> Materialize (cost=0.71..22504.89 rows=1 width=4) (actual time=0.002..9.083 rows=93684 loops=179902) -> Merge Left Join (cost=0.71..22504.88 rows=1 width=4) (actual time=0.016..140.731 rows=194102 loops=1) Merge Cond: (u0.id = u1.blog_id) Filter: (u1.id IS NULL) Rows Removed by Filter: 59794 -> Index Only Scan using test_app_blog_pkey on test_app_blog u0 (cost=0.42..15170.42 rows=207267 width=4) (actual time=0.007..67.559 rows=207358 loops=1) Heap Fetches: 33518 -> Index Scan using test_app_translation_51c6d5db on test_app_translation u1 (cost=0.29..6077.43 rows=59809 width=8) (actual time=0.005..30.515 rows=59794 loops=1) Planning Time: 0.936 ms Execution Time: 3034891.393 ms (16 rows)
The Exists does work:
SET max_parallel_workers_per_gather = 0 EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" AND NOT "test_app_blog"."api_has_translation" AND EXISTS( SELECT (1) AS "a" FROM "test_app_translation" U1 WHERE U1."blog_id" = "test_app_blog"."id" LIMIT 1 ) ) ORDER BY "test_app_blog"."updated_at" DESC Sort (cost=21037.38..21066.08 rows=11479 width=22) (actual time=63.818..63.820 rows=4 loops=1) Sort Key: test_app_blog.updated_at DESC Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=1501.37..20263.31 rows=11479 width=22) (actual time=17.974..63.805 rows=4 loops=1) -> HashAggregate (cost=1500.95..1632.70 rows=13175 width=4) (actual time=17.018..20.483 rows=13256 loops=1) Group Key: u1.blog_id Batches: 1 Memory Usage: 1425kB -> Index Only Scan using test_app_translation_51c6d5db on test_app_translation u1 (cost=0.29..1351.42 rows=59809 width=4) (actual time=0.007..6.721 rows=59794 loops=1) Heap Fetches: 735 -> Index Scan using test_app_blog_pkey on test_app_blog (cost=0.42..1.47 rows=1 width=22) (actual time=0.003..0.003 rows=0 loops=13256) Index Cond: (id = u1.blog_id) Filter: (is_published AND (NOT api_has_translation)) Rows Removed by Filter: 1 Planning Time: 0.703 ms Execution Time: 63.872 ms
Also played with various values for work_mem and haven't seen any change in the postgres planning.
I want to also mention that David is on to something.
SET enable_material='off'; EXPLAIN ANALYZE SELECT "test_app_blog"."id", "test_app_blog"."created_at", "test_app_blog"."updated_at", "test_app_blog"."is_published", "test_app_blog"."api_has_translation" FROM "test_app_blog" WHERE ( "test_app_blog"."is_published" AND NOT "test_app_blog"."api_has_translation" AND NOT ( EXISTS( SELECT (1) AS "a" FROM "test_app_blog" U0 LEFT OUTER JOIN "test_app_translation" U1 ON (U0."id" = U1."blog_id") WHERE ( U1."id" IS NULL AND U0."id" = "test_app_blog"."id" ) LIMIT 1 ) ) ) ORDER BY "test_app_blog"."updated_at" DESC; Sort (cost=143578.39..144029.85 rows=180585 width=22) (actual time=691.697..691.802 rows=4 loops=1) Sort Key: test_app_blog.updated_at DESC Sort Method: quicksort Memory: 25kB -> Hash Anti Join (cost=21645.24..125693.23 rows=180585 width=22) (actual time=306.930..691.769 rows=4 loops=1) Hash Cond: (test_app_blog.id = u0.id) -> Index Scan using test_app_blog_is_published_4b47c652_uniq on test_app_blog (cost=0.42..101768.51 rows=180586 width=22) (actual time=0.684..304.445 rows=179902 loops=1) Index Cond: (is_published = true) Filter: (NOT api_has_translation) Rows Removed by Filter: 12043 -> Hash (cost=21644.81..21644.81 rows=1 width=4) (actual time=283.265..283.369 rows=194102 loops=1) Buckets: 262144 (originally 1024) Batches: 2 (originally 1) Memory Usage: 6145kB -> Gather Merge (cost=1000.73..21644.81 rows=1 width=4) (actual time=6.321..245.263 rows=194102 loops=1) Workers Planned: 2 Workers Launched: 2 -> Merge Left Join (cost=0.71..20644.67 rows=1 width=4) (actual time=0.067..105.796 rows=64701 loops=3) Merge Cond: (u0.id = u1.blog_id) Filter: (u1.id IS NULL) Rows Removed by Filter: 19931 -> Parallel Index Only Scan using test_app_blog_pkey on test_app_blog u0 (cost=0.42..13961.37 rows=86361 width=4) (actual time=0.030..40.520 rows=69119 loops=3) Heap Fetches: 33632 -> Index Scan using test_app_translation_51c6d5db on test_app_translation u1 (cost=0.29..6077.43 rows=59809 width=8) (actual time=0.032..42.470 rows=59627 loops=3) Planning Time: 2.751 ms Execution Time: 692.507 ms
Not sure yet why my postgres seems restricted (I remember it has "enough" RAM) but will get back with a comment once I find out more.
comment:10 by , 19 months ago
Thank you for your report Lorand and for providing all these details. I'm also curious about what might be causing these slowdowns.
Until we figure out what might be to blame for the materialization issues I started looking into the unnecessary AND "blog"."id" = "blog"."id"
generation by generating the exact change in terms of SQL generation that 8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1 resulted in against the test suite.
The full result can be found in this gist and the issue manifests itself in two tests
I should have some time tomorrow to dig into the issue further.
comment:11 by , 19 months ago
I drafted a PR for the harmful constant expression comparison in the mean time and I'll see if there is a way to do saner NULL
handling so no LEFT JOIN
for the sole purpose of doing an IS NULL
is necessary. I think it's optimizable when detecting a filter_rhs is None
.
comment:14 by , 10 months ago
Had a discussion with Postgres major contributor about this very issue and a query planer optimization in Postgres 16+ that might make the EXISTS
solution worth investigating.
More likely to be related to #32143 (8593e162c9cb63a6c0b06daf045bc1c21eb4d7c1) which switched to using
EXISTS
overNOT IN
and was already part of Django 3.2.AFAIK this is the only report in three years about this change making things slower and the fact it jumped from a couple of seconds to hours makes me think there might be something else at play here (lack of analyze?). FWIW this change was advised by this article and examples in the wild of
NOT IN
performing poorly.Without more details such as the Postgres execution plans (use
EXPLAIN
on the query) it's very hard to provide any guidance here so I'll close as needsinfo for now as I cannot reproduce any form of slowdown with the provided model.The
AND "blog"."id" = "blog"."id"
part seems definitely fishy though so it'd be great to see if you get fast results without it as that appear to be a bug. You can try it out by doingI'd also be curious to hear if the following is faster as it should be equivalent
In the mean time you can use
~Q(id__in=Blog.objects.filter(translation=None, id=OuterRef("id")))
instead of~Q(translation=None)
to restore the previous behaviour if you're blocked.