Opened 4 years ago
Closed 4 years ago
#32772 closed Cleanup/optimization (fixed)
Database cache counts the DB size twice at a performance penalty
| Reported by: | Mike Lissner | Owned by: | Mike Lissner |
|---|---|---|---|
| Component: | Core (Cache system) | Version: | dev |
| Severity: | Normal | Keywords: | cache, database |
| Cc: | Triage Stage: | Ready for checkin | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
We have a lot of entries in the DB cache, and I've noticed that the following query shows up in my slow query log kind of a lot (Postgresql is slow at counting things):
SELECT COUNT(*) FROM cache_table;
This query is being run by the DB cache twice for every cache update in order to determine if culling is needed. First, in the cache setting code, it runs:
cursor.execute("SELECT COUNT(*) FROM %s" % table)
num = cursor.fetchone()[0]
now = timezone.now()
now = now.replace(microsecond=0)
if num > self._max_entries:
self._cull(db, cursor, now)
Then in self._cull (the last line above) it runs:
cursor.execute("DELETE FROM %s WHERE expires < %%s" % table,
[connection.ops.adapt_datetimefield_value(now)])
cursor.execute("SELECT COUNT(*) FROM %s" % table)
num = cursor.fetchone()[0]
if num > self._max_entries:
# Do culling routine here...
The idea is that if the MAX_ENTRIES setting is exceeded, it'll cull the DB cache down by some percentage so it doesn't grow forever.
I think that's fine, but given that the SELECT COUNT(*) query is slow, I wonder two things:
- Would a refactor to remove the second query be a good idea? If you pass the count from the first query into the
_cullmethod, you can then do:
def _cull(self, db, cursor, now, count):
...
cursor.execute("DELETE FROM %s WHERE expires < %%s" % table,
[connection.ops.adapt_datetimefield_value(now)])
deleted_count = cursor.rowcount
num = count - deleted_count
if num > self._max_entries:
# Do culling routine here...
That seems like a simple win.
- Is it reasonable to not run the culling code *every* time that we set a value? Like, could we run it every tenth time or every 100th time or something?
If this is a good idea, does anybody have a proposal for how to count this? I'd be happy just doing it on a mod of the current millisecond, but there's probably a better way (randint?).
Would a setting be a good idea here? We already have MAX_ENTRIES and CULL_FREQUENCY. CULL_FREQUENCY is "the fraction of entries that are culled when MAX_ENTRIES is reached." That sounds more like it should have been named CULL_RATIO (regrets!), but maybe a new setting for this could be called "CULL_EVERY_X"?
I think the first change is a no-brainer, but both changes seem like wins to me. Happy to implement either or both of these, but wanted buy-in first.
Change History (6)
comment:1 by , 4 years ago
| Component: | Uncategorized → Core (Cache system) |
|---|---|
| Keywords: | cache database added |
| Version: | 3.2 → dev |
comment:2 by , 4 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|---|
| Type: | Uncategorized → Cleanup/optimization |
comment:3 by , 4 years ago
| Has patch: | set |
|---|---|
| Owner: | changed from to |
| Status: | new → assigned |
I put together a PR for this here: https://github.com/django/django/pull/14447
(I'm setting the has_patch flag. Hopefully this is right.)
comment:4 by , 4 years ago
That's looking great, thanks! It's now showing up in the review queue so it will eventually be picked up.
comment:5 by , 4 years ago
| Triage Stage: | Accepted → Ready for checkin |
|---|
I'm not sure how widely used this backend is for large dataset but I agree with you that both of these changes are no brainers.
I like the idea of having a cache option control the culling frequency and I don't have a strong opinion on the implementation itself. Could we possibly focus this ticket on the
COUNTreduction offcursor.rowcountand have another one track the culling frequency changes though?FWIW this backend also suffers from other limitations that could be worth fixing (#23326) if you're interested in doing that.