Opened 9 years ago

Closed 9 years ago

Last modified 9 years ago

#23830 closed Cleanup/optimization (worksforme)

Make expired session clearing on database backend scale better

Reported by: Adam Johnson Owned by: Adam Johnson
Component: contrib.sessions Version: dev
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: yes UI/UX: no

Description

In two large django projects now I've had to swap out using the default clear_expired implementation on the database SessionStore to a chunked approach since a single delete across the whole table became a bottleneck query. It would be better if the database backend clear that shipped with django was already chunked so django apps scale better.

Change History (7)

comment:1 by Adam Johnson, 9 years ago

Status: newassigned

comment:2 by Adam Johnson, 9 years ago

PR: https://github.com/django/django/pull/3520

Adapted from some production code.

Tests pass but there is naturally no test for >10k sessions. I tried one out and of course it was very slow. We could make the chunk size an extra setting and that would make it testable?

comment:3 by Ilja Maas, 9 years ago

The patch is a partial improvement because it does not instantiate all the objects. However using 'IN' query statements is not quite fast (to say it nicely) in most DB's. As whereof I think it won't be faster in general. (not tested though)

Actually in this specific case a 'DELETE FROM session WHERE expire_date<'20XX-XX-XX' would deliver a huge performance boost, as the expire_date field is DB indexed and it will not create a long running transaction in which the records are deleted one by one.

So I would propose a solution which constructs a 'DELETE FROM.. WHERE' statement and executes it.

Version 0, edited 9 years ago by Ilja Maas (next)

comment:4 by Ilja Maas, 9 years ago

Giving the patch a second thought, it breaks possible signal handlers POST_DELETE and PRE_DELETE as they are not triggered in this way.

The same problem is with my own proposal above.

So I think your patch is not a solution.
On the other hand, how does it happen that there are so many expired sessions to remove. Can't you just run the session cleanup more often?

comment:5 by Tim Graham, 9 years ago

I don't think the manual chunking solution proposed on the PR is acceptable. That makes the code quite scary.

Also, when I tested clearsessions, I do get a single query:

DELETE FROM "django_session" WHERE "django_session"."expire_date" < %s' - PARAMS = (u'2014-11-20 16:13:19.188990',); args=(u'2014-11-20 16:13:19.188990',

I wonder if you have a FK to Session or signal handler that's preventing fast-path deletion; see the QuerySet.delete() docs.

comment:6 by Tim Graham, 9 years ago

Resolution: worksforme
Status: assignedclosed

comment:7 by Adam Johnson, 9 years ago

I can confirm there were no problems with signal handlers or foreign keys making the fast deletion path ignored. The problem (at my previous employer's, Memrise) was literally the DELETE FROM WHERE query touching too many rows in one operation and locking the index[es] whilst it removes them. The session table was seeing relatively high writes (memcached fleet for reads) as there were a number of features mutating session data.

Actually at one point clearsessions made the site go down briefly. The daily cron task had not run for a couple weeks due to misconfiguration, so when it did run the DELETE query was removing around half the table. This is all in MySQL but I'm sure postgres would behave similarly with such a large table.

If you don't want to introduce the chunked deletion into django core I understand, this problem must only occur with such high traffic sites. I'm using more tools like pt-archiver ( http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html ) for such cleanup operations now anyway. At least I had my first go at submitting a django patch though, thanks!

Note: See TracTickets for help on using tickets.
Back to Top