﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
32772	Database cache counts the DB size twice at a performance penalty	Mike Lissner	Mike Lissner	"We have a lot of entries in the DB cache, and I've noticed that the following query shows up in my slow query log kind of a lot (Postgresql is slow at counting things):

{{{
SELECT COUNT(*) FROM  cache_table;
}}}

This query is being run by the DB cache **twice** for every cache update in order to determine if culling is needed. First, in the cache setting code, it runs:

{{{
            cursor.execute(""SELECT COUNT(*) FROM %s"" % table)
            num = cursor.fetchone()[0]
            now = timezone.now()
            now = now.replace(microsecond=0)
            if num > self._max_entries:
                self._cull(db, cursor, now)
}}}

(https://github.com/django/django/blob/d06c5b358149c02a62da8a5469264d05f29ac659/django/core/cache/backends/db.py#L120-L131)

Then in self._cull (the last line above) it runs:

{{{
            cursor.execute(""DELETE FROM %s WHERE expires < %%s"" % table,
                           [connection.ops.adapt_datetimefield_value(now)])
            cursor.execute(""SELECT COUNT(*) FROM %s"" % table)
            num = cursor.fetchone()[0]
            if num > self._max_entries:
                # Do culling routine here...
}}}

(https://github.com/django/django/blob/d06c5b358149c02a62da8a5469264d05f29ac659/django/core/cache/backends/db.py#L254-L260)

The idea is that if the MAX_ENTRIES setting is exceeded, it'll cull the DB cache down by some percentage so it doesn't grow forever. 

I think that's fine, but given that the SELECT COUNT(*) query is slow, I wonder two things:

1. Would a refactor to remove the second query be a good idea? If you pass the count from the first query into the `_cull` method, you can then do:

{{{
        def _cull(self, db, cursor, now, count):
            ...
            cursor.execute(""DELETE FROM %s WHERE expires < %%s"" % table,
                           [connection.ops.adapt_datetimefield_value(now)])
            deleted_count = cursor.rowcount
            num  = count - deleted_count
            if num > self._max_entries:
                # Do culling routine here...
}}}

That seems like a simple win.

2. Is it reasonable to not run the culling code *every* time that we set a value? Like, could we run it every tenth time or every 100th time or something? 

If this is a good idea, does anybody have a proposal for how to count this? I'd be happy just doing it on a mod of the current millisecond, but there's probably a better way (randint?). 

Would a setting be a good idea here? We already have MAX_ENTRIES and CULL_FREQUENCY. CULL_FREQUENCY is ""the fraction of entries that are culled  when ``MAX_ENTRIES`` is reached."" That sounds more like it should have been named CULL_RATIO (regrets!), but maybe a new setting for this could be called ""CULL_EVERY_X""? 

I think the first change is a no-brainer, but both changes seem like wins to me. Happy to implement either or both of these, but wanted buy-in first.
"	Cleanup/optimization	closed	Core (Cache system)	dev	Normal	fixed	cache, database		Ready for checkin	1	0	0	0	0	0
