#11260 closed Uncategorized (wontfix)
File based cache not very efficient with large amounts of cached files
| Reported by: | anteater_sa | Owned by: | josh |
|---|---|---|---|
| Component: | Core (Cache system) | Version: | 1.1 |
| Severity: | Normal | Keywords: | filebased file cache |
| Cc: | john@… | Triage Stage: | Design decision needed |
| Has patch: | yes | Needs documentation: | yes |
| Needs tests: | yes | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
When using the file based cache having a large number of cached pages (in my case over 100,000) makes the system inefficient as there is a function in django.core.cache.backends.filebased called _get_num_entries which actually walks through the cache direcotry structure counting files.
Maybe setting max_entries to 0 in the settings file could mean unlimited cached files, then the _get_num_entries function could be as follows:
def _get_num_entries(self):
count = 0
if max_entries == 0: return count # Don't count files if max_entries is set to 0
return count
for _,_,files in os.walk(self._dir):
count += len(files)
return count
_num_entries = property(_get_num_entries)
Attachments (2)
Change History (14)
comment:1 by , 16 years ago
comment:2 by , 16 years ago
| milestone: | 1.1 → 1.2 |
|---|---|
| Needs documentation: | set |
| Needs tests: | set |
1.1 is at a feature freeze right now so moving to 1.2.
comment:3 by , 16 years ago
| Triage Stage: | Unreviewed → Design decision needed |
|---|
comment:4 by , 16 years ago
| Cc: | added |
|---|---|
| Has patch: | unset |
I've been bitten by this also.
My fix is to return count while it is still 0 and manage the cache myself using a daily cron job.
It would be nice if Django allowed me to disable the filebased cache management feature using settings.py.
comment:5 by , 16 years ago
| Has patch: | set |
|---|---|
| Owner: | changed from to |
| Status: | new → assigned |
comment:6 by , 16 years ago
Just to explain the patch...
If you pass 'max_entries=0' no culling of the cache will ever occur.
If number of entries is less than 'max_entries' then culling will be performed.
comment:7 by , 16 years ago
| Resolution: | → wontfix |
|---|---|
| Status: | assigned → closed |
I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious.
If you need a cache capable of holding 100000 items, I strongly recommend you look at memcache. If you insist on using the filesystem as a cache, it isn't hard to subclass and extend the existing cache.
comment:8 by , 16 years ago
Are you saying that file based cache is not suitable for production? File based cache is more suitable for some scenarios than memcache.
I use file caching to cache processed JPG's. Memcache is not as capable for such a scenario.
comment:9 by , 16 years ago
| Version: | 1.0 → 1.1 |
|---|
I have Django sites of tens of thousands of pages running for over 2 years using the above patches, so your statements about filesystem caching not a serious strategy are irrelevant. Also, filesystem caching is not comparable to memcaching, they solve two completely different problems.
comment:11 by , 10 years ago
| Easy pickings: | unset |
|---|---|
| Severity: | → Normal |
| Type: | → Uncategorized |
| UI/UX: | unset |
I had the same problem but a couple more requirements. For future readers, I want to mention the DiskCache (http://www.grantjenks.com/docs/diskcache/) project. DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. There are no dependencies outside the standard library (no managing other processes) and it's efficient enough to handle cache evictions during the request/response cycle (no cron job necessary).
sorry, should be as follows:
def _get_num_entries(self): count = 0 if max_entries == 0: return count # Don't count files if max_entries is set to 0 for _,_,files in os.walk(self._dir): count += len(files) return count _num_entries = property(_get_num_entries)