Code

Opened 5 years ago

Closed 4 years ago

Last modified 3 years ago

#11260 closed (wontfix)

File based cache not very efficient with large amounts of cached files

Reported by: anteater_sa Owned by: josh
Component: Core (Cache system) Version: 1.1
Severity: Keywords: filebased file cache
Cc: john@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: yes
Needs tests: yes Patch needs improvement: no
Easy pickings: UI/UX:

Description

When using the file based cache having a large number of cached pages (in my case over 100,000) makes the system inefficient as there is a function in django.core.cache.backends.filebased called _get_num_entries which actually walks through the cache direcotry structure counting files.

Maybe setting max_entries to 0 in the settings file could mean unlimited cached files, then the _get_num_entries function could be as follows:

    def _get_num_entries(self):
        count = 0
        if max_entries == 0: return count # Don't count files if max_entries is set to 0
        return count
        for _,_,files in os.walk(self._dir):
            count += len(files)
        return count
    _num_entries = property(_get_num_entries)

Attachments (2)

patch.diff (475 bytes) - added by josh 4 years ago.
Patch that fixes issue
patch.2.diff (478 bytes) - added by josh 4 years ago.
Opps, made a mistake with the first patch.

Download all attachments as: .zip

Change History (12)

comment:1 Changed 5 years ago by anteater_sa

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

sorry, should be as follows:

    def _get_num_entries(self):
        count = 0
        if max_entries == 0: return count # Don't count files if max_entries is set to 0
        for _,_,files in os.walk(self._dir):
            count += len(files)
        return count
    _num_entries = property(_get_num_entries)

comment:2 Changed 5 years ago by dc

  • milestone changed from 1.1 to 1.2
  • Needs documentation set
  • Needs tests set

1.1 is at a feature freeze right now so moving to 1.2.

comment:3 Changed 5 years ago by Alex

  • Triage Stage changed from Unreviewed to Design decision needed

comment:4 Changed 4 years ago by JohnMoylan

  • Cc john@… added
  • Has patch unset

I've been bitten by this also.
My fix is to return count while it is still 0 and manage the cache myself using a daily cron job.

It would be nice if Django allowed me to disable the filebased cache management feature using settings.py.

Changed 4 years ago by josh

Patch that fixes issue

comment:5 Changed 4 years ago by josh

  • Has patch set
  • Owner changed from nobody to josh
  • Status changed from new to assigned

Changed 4 years ago by josh

Opps, made a mistake with the first patch.

comment:6 Changed 4 years ago by josh

Just to explain the patch...

If you pass 'max_entries=0' no culling of the cache will ever occur.
If number of entries is less than 'max_entries' then culling will be performed.

comment:7 Changed 4 years ago by russellm

  • Resolution set to wontfix
  • Status changed from assigned to closed

I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious.

If you need a cache capable of holding 100000 items, I strongly recommend you look at memcache. If you insist on using the filesystem as a cache, it isn't hard to subclass and extend the existing cache.

comment:8 Changed 4 years ago by JohnMoylan

Are you saying that file based cache is not suitable for production? File based cache is more suitable for some scenarios than memcache.

I use file caching to cache processed JPG's. Memcache is not as capable for such a scenario.

comment:9 Changed 4 years ago by anteater_sa

  • Version changed from 1.0 to 1.1

I have Django sites of tens of thousands of pages running for over 2 years using the above patches, so your statements about filesystem caching not a serious strategy are irrelevant. Also, filesystem caching is not comparable to memcaching, they solve two completely different problems.

comment:10 Changed 3 years ago by jacob

  • milestone 1.2 deleted

Milestone 1.2 deleted

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.