Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#23830 closed Cleanup/optimization (worksforme)

Make expired session clearing on database backend scale better

Reported by: Adam (Chainz) Johnson Owned by: Adam (Chainz) Johnson
Component: contrib.sessions Version: master
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no


In two large django projects now I've had to swap out using the default clear_expired implementation on the database SessionStore to a chunked approach since a single delete across the whole table became a bottleneck query. It would be better if the database backend clear that shipped with django was already chunked so django apps scale better.

Change History (7)

comment:1 Changed 3 years ago by Adam (Chainz) Johnson

Status: newassigned

comment:2 Changed 3 years ago by Adam (Chainz) Johnson


Adapted from some production code.

Tests pass but there is naturally no test for >10k sessions. I tried one out and of course it was very slow. We could make the chunk size an extra setting and that would make it testable?

comment:3 Changed 3 years ago by Ilja Maas

The patch is a partial improvement because it does not instantiate all the objects. However using 'IN' query statements with a huge number of ids in it is not quite fast (to say it nicely) in most DB's. As whereof I think it won't be faster in general. (not tested though)

Actually in this specific case a 'DELETE FROM session WHERE expire_date<'20XX-XX-XX' would deliver a huge performance boost, as the expire_date field is DB indexed and it will not create a long running transaction in which the records are deleted one by one.

So I would propose a solution which constructs a 'DELETE FROM.. WHERE' statement and executes it.

Last edited 3 years ago by Ilja Maas (previous) (diff)

comment:4 Changed 3 years ago by Ilja Maas

Giving the patch a second thought, it breaks possible signal handlers POST_DELETE and PRE_DELETE as they are not triggered in this way.

The same problem is with my own proposal above.

So I think your patch is not a solution.
On the other hand, how does it happen that there are so many expired sessions to remove. Can't you just run the session cleanup more often?

comment:5 Changed 3 years ago by Tim Graham

I don't think the manual chunking solution proposed on the PR is acceptable. That makes the code quite scary.

Also, when I tested clearsessions, I do get a single query:

DELETE FROM "django_session" WHERE "django_session"."expire_date" < %s' - PARAMS = (u'2014-11-20 16:13:19.188990',); args=(u'2014-11-20 16:13:19.188990',

I wonder if you have a FK to Session or signal handler that's preventing fast-path deletion; see the QuerySet.delete() docs.

comment:6 Changed 3 years ago by Tim Graham

Resolution: worksforme
Status: assignedclosed

comment:7 Changed 3 years ago by Adam (Chainz) Johnson

I can confirm there were no problems with signal handlers or foreign keys making the fast deletion path ignored. The problem (at my previous employer's, Memrise) was literally the DELETE FROM WHERE query touching too many rows in one operation and locking the index[es] whilst it removes them. The session table was seeing relatively high writes (memcached fleet for reads) as there were a number of features mutating session data.

Actually at one point clearsessions made the site go down briefly. The daily cron task had not run for a couple weeks due to misconfiguration, so when it did run the DELETE query was removing around half the table. This is all in MySQL but I'm sure postgres would behave similarly with such a large table.

If you don't want to introduce the chunked deletion into django core I understand, this problem must only occur with such high traffic sites. I'm using more tools like pt-archiver ( ) for such cleanup operations now anyway. At least I had my first go at submitting a django patch though, thanks!

Note: See TracTickets for help on using tickets.
Back to Top