#11260 closed Uncategorized (wontfix)
File based cache not very efficient with large amounts of cached files
Reported by: | anteater_sa | Owned by: | josh |
---|---|---|---|
Component: | Core (Cache system) | Version: | 1.1 |
Severity: | Normal | Keywords: | filebased file cache |
Cc: | john@… | Triage Stage: | Design decision needed |
Has patch: | yes | Needs documentation: | yes |
Needs tests: | yes | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
When using the file based cache having a large number of cached pages (in my case over 100,000) makes the system inefficient as there is a function in django.core.cache.backends.filebased called _get_num_entries which actually walks through the cache direcotry structure counting files.
Maybe setting max_entries to 0 in the settings file could mean unlimited cached files, then the _get_num_entries function could be as follows:
def _get_num_entries(self): count = 0 if max_entries == 0: return count # Don't count files if max_entries is set to 0 return count for _,_,files in os.walk(self._dir): count += len(files) return count _num_entries = property(_get_num_entries)
Attachments (2)
Change History (14)
comment:1 by , 15 years ago
comment:2 by , 15 years ago
milestone: | 1.1 → 1.2 |
---|---|
Needs documentation: | set |
Needs tests: | set |
1.1 is at a feature freeze right now so moving to 1.2.
comment:3 by , 15 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:4 by , 15 years ago
Cc: | added |
---|---|
Has patch: | unset |
I've been bitten by this also.
My fix is to return count while it is still 0 and manage the cache myself using a daily cron job.
It would be nice if Django allowed me to disable the filebased cache management feature using settings.py.
comment:5 by , 15 years ago
Has patch: | set |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:6 by , 15 years ago
Just to explain the patch...
If you pass 'max_entries=0' no culling of the cache will ever occur.
If number of entries is less than 'max_entries' then culling will be performed.
comment:7 by , 15 years ago
Resolution: | → wontfix |
---|---|
Status: | assigned → closed |
I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious.
If you need a cache capable of holding 100000 items, I strongly recommend you look at memcache. If you insist on using the filesystem as a cache, it isn't hard to subclass and extend the existing cache.
comment:8 by , 15 years ago
Are you saying that file based cache is not suitable for production? File based cache is more suitable for some scenarios than memcache.
I use file caching to cache processed JPG's. Memcache is not as capable for such a scenario.
comment:9 by , 15 years ago
Version: | 1.0 → 1.1 |
---|
I have Django sites of tens of thousands of pages running for over 2 years using the above patches, so your statements about filesystem caching not a serious strategy are irrelevant. Also, filesystem caching is not comparable to memcaching, they solve two completely different problems.
comment:11 by , 9 years ago
Easy pickings: | unset |
---|---|
Severity: | → Normal |
Type: | → Uncategorized |
UI/UX: | unset |
I had the same problem but a couple more requirements. For future readers, I want to mention the DiskCache (http://www.grantjenks.com/docs/diskcache/) project. DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. There are no dependencies outside the standard library (no managing other processes) and it's efficient enough to handle cache evictions during the request/response cycle (no cron job necessary).
comment:12 by , 6 years ago
https://code.djangoproject.com/ticket/29994#ticket is related issue that proposed explicitly documenting this performance issues
sorry, should be as follows: