Opened 3 years ago
Closed 3 years ago
#32772 closed Cleanup/optimization (fixed)
Database cache counts the DB size twice at a performance penalty
Reported by: | Mike Lissner | Owned by: | Mike Lissner |
---|---|---|---|
Component: | Core (Cache system) | Version: | dev |
Severity: | Normal | Keywords: | cache, database |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
We have a lot of entries in the DB cache, and I've noticed that the following query shows up in my slow query log kind of a lot (Postgresql is slow at counting things):
SELECT COUNT(*) FROM cache_table;
This query is being run by the DB cache twice for every cache update in order to determine if culling is needed. First, in the cache setting code, it runs:
cursor.execute("SELECT COUNT(*) FROM %s" % table) num = cursor.fetchone()[0] now = timezone.now() now = now.replace(microsecond=0) if num > self._max_entries: self._cull(db, cursor, now)
Then in self._cull (the last line above) it runs:
cursor.execute("DELETE FROM %s WHERE expires < %%s" % table, [connection.ops.adapt_datetimefield_value(now)]) cursor.execute("SELECT COUNT(*) FROM %s" % table) num = cursor.fetchone()[0] if num > self._max_entries: # Do culling routine here...
The idea is that if the MAX_ENTRIES setting is exceeded, it'll cull the DB cache down by some percentage so it doesn't grow forever.
I think that's fine, but given that the SELECT COUNT(*) query is slow, I wonder two things:
- Would a refactor to remove the second query be a good idea? If you pass the count from the first query into the
_cull
method, you can then do:
def _cull(self, db, cursor, now, count): ... cursor.execute("DELETE FROM %s WHERE expires < %%s" % table, [connection.ops.adapt_datetimefield_value(now)]) deleted_count = cursor.rowcount num = count - deleted_count if num > self._max_entries: # Do culling routine here...
That seems like a simple win.
- Is it reasonable to not run the culling code *every* time that we set a value? Like, could we run it every tenth time or every 100th time or something?
If this is a good idea, does anybody have a proposal for how to count this? I'd be happy just doing it on a mod of the current millisecond, but there's probably a better way (randint?).
Would a setting be a good idea here? We already have MAX_ENTRIES and CULL_FREQUENCY. CULL_FREQUENCY is "the fraction of entries that are culled when MAX_ENTRIES
is reached." That sounds more like it should have been named CULL_RATIO (regrets!), but maybe a new setting for this could be called "CULL_EVERY_X"?
I think the first change is a no-brainer, but both changes seem like wins to me. Happy to implement either or both of these, but wanted buy-in first.
Change History (6)
comment:1 by , 3 years ago
Component: | Uncategorized → Core (Cache system) |
---|---|
Keywords: | cache database added |
Version: | 3.2 → dev |
comment:2 by , 3 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Cleanup/optimization |
comment:3 by , 3 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
I put together a PR for this here: https://github.com/django/django/pull/14447
(I'm setting the has_patch flag. Hopefully this is right.)
comment:4 by , 3 years ago
That's looking great, thanks! It's now showing up in the review queue so it will eventually be picked up.
comment:5 by , 3 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
I'm not sure how widely used this backend is for large dataset but I agree with you that both of these changes are no brainers.
I like the idea of having a cache option control the culling frequency and I don't have a strong opinion on the implementation itself. Could we possibly focus this ticket on the
COUNT
reduction offcursor.rowcount
and have another one track the culling frequency changes though?FWIW this backend also suffers from other limitations that could be worth fixing (#23326) if you're interested in doing that.