Opened 15 years ago

Closed 15 years ago

Last modified 6 years ago

#11260 closed Uncategorized (wontfix)

File based cache not very efficient with large amounts of cached files

Reported by: anteater_sa Owned by: josh
Component: Core (Cache system) Version: 1.1
Severity: Normal Keywords: filebased file cache
Cc: john@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

When using the file based cache having a large number of cached pages (in my case over 100,000) makes the system inefficient as there is a function in django.core.cache.backends.filebased called _get_num_entries which actually walks through the cache direcotry structure counting files.

Maybe setting max_entries to 0 in the settings file could mean unlimited cached files, then the _get_num_entries function could be as follows:

    def _get_num_entries(self):
        count = 0
        if max_entries == 0: return count # Don't count files if max_entries is set to 0
        return count
        for _,_,files in os.walk(self._dir):
            count += len(files)
        return count
    _num_entries = property(_get_num_entries)

Attachments (2)

patch.diff (475 bytes ) - added by josh 15 years ago.
Patch that fixes issue
patch.2.diff (478 bytes ) - added by josh 15 years ago.
Opps, made a mistake with the first patch.

Download all attachments as: .zip

Change History (14)

comment:1 by anteater_sa, 15 years ago

sorry, should be as follows:

    def _get_num_entries(self):
        count = 0
        if max_entries == 0: return count # Don't count files if max_entries is set to 0
        for _,_,files in os.walk(self._dir):
            count += len(files)
        return count
    _num_entries = property(_get_num_entries)

comment:2 by dc, 15 years ago

milestone: 1.11.2
Needs documentation: set
Needs tests: set

1.1 is at a feature freeze right now so moving to 1.2.

comment:3 by Alex Gaynor, 15 years ago

Triage Stage: UnreviewedDesign decision needed

comment:4 by John Moylan, 15 years ago

Cc: john@… added
Has patch: unset

I've been bitten by this also.
My fix is to return count while it is still 0 and manage the cache myself using a daily cron job.

It would be nice if Django allowed me to disable the filebased cache management feature using settings.py.

by josh, 15 years ago

Attachment: patch.diff added

Patch that fixes issue

comment:5 by josh, 15 years ago

Has patch: set
Owner: changed from nobody to josh
Status: newassigned

by josh, 15 years ago

Attachment: patch.2.diff added

Opps, made a mistake with the first patch.

comment:6 by josh, 15 years ago

Just to explain the patch...

If you pass 'max_entries=0' no culling of the cache will ever occur.
If number of entries is less than 'max_entries' then culling will be performed.

comment:7 by Russell Keith-Magee, 15 years ago

Resolution: wontfix
Status: assignedclosed

I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious.

If you need a cache capable of holding 100000 items, I strongly recommend you look at memcache. If you insist on using the filesystem as a cache, it isn't hard to subclass and extend the existing cache.

comment:8 by John Moylan, 15 years ago

Are you saying that file based cache is not suitable for production? File based cache is more suitable for some scenarios than memcache.

I use file caching to cache processed JPG's. Memcache is not as capable for such a scenario.

comment:9 by anteater_sa, 15 years ago

Version: 1.01.1

I have Django sites of tens of thousands of pages running for over 2 years using the above patches, so your statements about filesystem caching not a serious strategy are irrelevant. Also, filesystem caching is not comparable to memcaching, they solve two completely different problems.

comment:10 by Jacob, 13 years ago

milestone: 1.2

Milestone 1.2 deleted

comment:11 by Grant Jenks, 9 years ago

Easy pickings: unset
Severity: Normal
Type: Uncategorized
UI/UX: unset

I had the same problem but a couple more requirements. For future readers, I want to mention the DiskCache (http://www.grantjenks.com/docs/diskcache/) project. DiskCache is an Apache2 licensed disk and file backed cache library, written in pure-Python, and compatible with Django. There are no dependencies outside the standard library (no managing other processes) and it's efficient enough to handle cache evictions during the request/response cycle (no cron job necessary).

comment:12 by Mateusz Konieczny, 6 years ago

#29994 proposes documenting this performance issues.

Last edited 6 years ago by Tim Graham (previous) (diff)
Note: See TracTickets for help on using tickets.
Back to Top