﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
33973	Performance regression when moving from 3.1 to 3.2	Marc Parizeau	nobody	"I am seing a sharp increase in execution time for some of my queries when moving from Django 3.1 to 3.2. And the performance hit appears to be the same for Django 4.

My backend is Postgres 14.2. 

My Django project has forums for different types of content. The forum app consists essentially of 5 tables:
1. a `Post` table that contains forum posts (essentially a text field);
2. a `Thread` table where rows point to a specific content and post;
3. a `FollowUp` table where rows point to a specific thread and post;
4. a `ThreadEntry` table where rows point to a thread, a user, and the last seen thread post for this user;
5. a `FollowUpEntry` table where rows point to a followup, a user, and the last seen followup post for this user.

Here is an example query that executes 2 times slower on 3.2 than on 3.1:

{{{
Content.objects.all().annotate(
  has_unread_posts=Greatest(
    # a content is unread if at least one condition is true
    Exists(
      # a thread has never been read (has no user entry)
      Thread.objects.filter(
        content=OuterRef('pk'),
      ).exclude(threadentry__user=user)
    ),
    Exists(
      # a thread entry is not up-to-date
      ThreadEntry.objects.filter(
        thread__content=OuterRef('pk'),
        user=user,
      ).exclude(post=F('thread__post'))
    ),
    Exists(
      # a followup has never been read
      FollowUp.objects.filter(
        thread__content=OuterRef('pk')
      ).exclude(followupentry__user=user)
    ),
    Exists(
      # a followup entry is not up-to-date
      FollowUpEntry.objects.filter(
        followup__thread__content=OuterRef('pk'),
        user=user,
      ).exclude(post=F('followup__post'))
    ),
  )
).filter(
  has_unread_posts=True,
).order_by(
  'course__uid',
  '-version__start',
).select_related(
  'course',
  'version',
)
}}}


`Course` and `Version` are other tables related to `Content`.

I want to know with this query, for each content, whether or not there is something new in the corresponding forum for a given user.  There is something new if any one of the following condition is true:
1. there exists a thread for which the user has no thread entry (an entry is added when the thread is first read by the user);
2. there exists a user thread entry for which the last read post is not up to date with the current thread post (the thread owner has modified the post since);
3. there exists a followup for which the user has no followup entry (an entry is added when the followup is first read by the user);
4. there exists a user followup entry for which the last read post is not up to date with the followup post (the followup owner has modified the post since).

On my machine, just by changing the Django version using pip, and nothing else, this query takes about 1 second of execution on Django 3.1.14, and a little more than 2 seconds on Django 3.2.15, so about a 2x increase. Here are the current table sizes for my forum app:
- `Thread`: ~30K
- `FollowUp`: ~46K
- `ThreadEntry`: ~1.3M
- `FollowUpEntry`: ~4.5M
- `Post`: ~103K

And there is 33 `Content` rows.

Am I the only one observing such performance regressions with Django 3.2? On other more complex queries that contain nested subqueries, I have seen up to **30x** execution time increases. 

Did something major happen in SQL generation from 3.1 to 3.2?

Am I doing something wrong? How can this happen?

Any help on understanding what is going on with Django 3.2 would be much appreciated.

Best regards,"	Uncategorized	closed	Database layer (models, ORM)	3.2	Normal	needsinfo	performance regression	Simon Charette	Unreviewed	0	0	0	0	0	0
