Opened 3 years ago

Last modified 3 months ago

#32785 new Cleanup/optimization

Reduce culling frequency of database cache backend

Reported by: Mike Lissner Owned by: nobody
Component: Core (Cache system) Version: dev
Severity: Normal Keywords: cache, database
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

(Splitting off from: https://code.djangoproject.com/ticket/32772)

Background

We currently have two settings related to culling caches:

MAX_ENTRIES : The number of entries that could go in the cache.

CULL_FREQUENCY : The fraction of entries that are culled when MAX_ENTRIES is reached.

(https://docs.djangoproject.com/en/3.2/topics/cache/#cache-arguments)

I run a system with a very large DB cache and I've found that the cache culling code is a bit heavy-handed and slow. Currently, every time you set, add, or touch a cache entry, it runs a cull to make sure that the cache hasn't grown beyond MAX_ENTRIES.

The cull triggers a number of queries. Here's an example capturing queries while setting a key:

1. SELECT COUNT(*) FROM "test cache table"
2. DELETE FROM "test cache table" WHERE expires < '2021-05-25 00:21:04'
3. SELECT COUNT(*) FROM "test cache table"
4. SELECT cache_key FROM "test cache table" ORDER BY cache_key LIMIT 1 OFFSET 0
5. BEGIN
6. SELECT "cache_key", "expires" FROM "test cache table" WHERE "cache_key" = ':1:force_cull_delete'
7. INSERT INTO "test cache table" ("cache_key", "value", "expires") VALUES (':1:force_cull_delete', 'gAWVCQAAAAAAAACMBXZhbHVllC4=', '2021-05-25 00:37:44')

Queries number 1, 2, 3, and 4 are from the culling code. #32772 should eliminate number 3, but there's another query after step 4 that's not shown that could do additional deletions too in some cases. Some of these queries are pretty slow in a cache with a lot of entries (on postgresql, at least). I noticed this issue because the COUNT queries were showing up in my slow query log.

Proposal

I propose that we don't cull the cache so often. There's two ways to go about it:

  1. We just change the functionality so that it culls every so often; we pick a frequency, and we hard code it into the DatabaseCache backend. If people want to change it, they can override the backend.
  1. We make a new setting, pick a default for it, and run with that.

One frustrating thing about this is that the current setting for how *much* to cull is called CULL_FREQUENCY instead of something better like CULL_RATIO, so we can't have that variable name. If we went with option two, we'd have to come up with a variable name like CULL_EVERY_X or something. If not for this, I think I'd go for option 2.

If we do option 1, we avoid that mess, but it splits up the settings for the DB cache between your settings file and your custom backend — but maybe that's fine! I'm happy to implement either approach.

Other questions

  1. I'm agnostic about how to count queries. We could use a random number on each query and then do a mod of it. Something like:
random_number = randint(0, CULL_EVERY_X)
if random_number == CULL_EVERY_X:
    cull()

Or, sort of equivalently, use a mod on the current time.

Or we could do some sort of counter in the Python code, but that's probably asking for trouble. There's probably a better way here, but I don't know what it is off the top of my head.

  1. How do we think about this setting? It could just be a counter: We cull every 20 adds, sets, or touches. Fine. But maybe it makes more sense as a fraction of your MAX_ENTRIES? That'd make it work sort of similarly to CULL_FREQUENCY, which is a percentage of MAX_ENTRIES that are culled whenever a cull is run. So imagine:

MAX_ENTRIES = 1000
CULL_FREQUENCY = 3
CULL_EVERY_X = 5

With those settings, you'd cull down to 667 entries (MAX_ENTRIES - MAX_ENTRIES/CULL_FREQUENCY) when you got to 1,200 entries (MAX_ENTRIES + MAX_ENTRIES/CULL_EVERY_X).

  1. Finally, what's a good default for this? We could take the safe route and cull every query, or move the cheese by changing things in the next release. I vote for the latter, and suggest culling every 1% of MAX_ENTRIES or so by default. I haven't thought about this carefully, but it seems like a good place to begin.

Change History (3)

comment:1 by Carlton Gibson, 3 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

Simon on #3277:

... both of these changes are no brainers.

So 👍 🙂

I prefer Option 1 above: "...we pick a frequency..." — I'm not convinced we do anyone a favour by exposing API and documenting this as a separate decision folks need to think through.

comment:2 by Mike Lissner, 3 years ago

That works for me. In that case, I'll make this more concrete: Unless further discussion here says otherwise, I intend to create a draft PR on Github that adds a new property to the cache DB class. The property will set CULL_EVERY_X (or maybe a better-named var) to 20, meaning that it'll cull every time the cache gets to be 5% bigger than MAX_ENTRIES. (This default is currently arbitrary, but seems fairly reasonable to me?)

I don't have any good sense of how to do the counter though. Any suggestions that are better than a mod of a random number?

comment:3 by regiov, 3 months ago

I'm also interested in this kind of improvement, but my use case has a much simpler solution. Since I'm erasing the whole cache externally depending on certain conditions, both my TIMEOUT and MAX_ENTRIES can be None, meaning the DELETE statement is never run. However, SELECT count(*) still gets called unnecessarily on every addition to the cache. A simple condition to check if MAX_ENTRIES is not None before counting records would be enough for my use case. I just don't know if you guys prefer a more general solution that could address a broader set of use cases.

Note: See TracTickets for help on using tickets.
Back to Top