Opened 7 months ago

Last modified 6 months ago

#29994 new Cleanup/optimization

Document performance issues in FileBasedCache

Reported by: Mateusz Konieczny Owned by: nobody
Component: Documentation Version: 2.1
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

https://docs.djangoproject.com/en/2.1/topics/cache/#filesystem-caching is not mentioning any problems with FileBasedCache

According to https://github.com/grantjenks/python-diskcache#diskcache-disk-backed-cache "Unfortunately the file-based cache in Django is essentially broken. The culling method is random and large caches repeatedly scan a cache directory which slows linearly with growth. Can you really allow it to take sixty milliseconds to store a key in a cache with a thousand items?"

From my checking it seems to not be reported so far.

According to https://github.com/grantjenks/python-diskcache/issues/93#issuecomment-442580191 "The deficiencies of FileBasedCache are too well known to require enumeration" but it would be nice to warn also developers unaware of this problems.

Change History (6)

comment:1 Changed 7 months ago by Mateusz Konieczny

Needs documentation: set

comment:2 Changed 7 months ago by Grant Jenks

Related, and much older issue: https://code.djangoproject.com/ticket/11260

I agree that the file-based caching docs could be improved to described the limitations. I would also welcome a link to the Disk Cache project (disclaimer: I'm the author.)

For the Django developers, please don't think Disk Cache is a criticism of Django's file-based cache. I love Django! The Disk Cache project is thousands of lines of code with many non-Django users. I don't think the code nor design is appropriate for inclusion in Django.

comment:3 Changed 7 months ago by Tim Graham

Needs documentation: unset
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

The Django documentation avoids endorsing third-party packages because of the burden of curation this would present. I think we would need a consensus on the DevelopersMailingList to do that.

comment:4 Changed 7 months ago by Mateusz Konieczny

From #11260 - ""I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious.""

Would it be OK to submit patch that for start will add something like "Note that this caching strategy is useful only for development and should not be used in production due to a poor performance" ?

Last edited 7 months ago by Mateusz Konieczny (previous) (diff)

comment:5 Changed 7 months ago by Tim Graham

I'm not sure. I think ticket:11260#comment:9 is interesting: "I have Django sites of tens of thousands of pages running for over 2 years using the above patches, so your statements about filesystem caching not a serious strategy are irrelevant." The patch for #11260 is very simple and seems worth considering if it makes file-based caching usable in production. Do you have an opinion on that, Grant?

comment:6 Changed 6 months ago by Grant Jenks

I looked at the patch in #11260 but as far as I can tell it only adds a scenario where when _max_enries is set to 0 or None then no culling ever takes place. I'm not sure how anteater_sa handles cache size in that case. I suppose if you can guarantee that the filesystem cache will never grow beyond a certain size then that's a reasonable strategy. It's more of a persistent dictionary though.

I certainly think filesystem caching has serious use cases. I myself have used diskcache for an ecommerce site for several years with tens of thousands of pages. I started with the built-in file-based cached but later chose sqlite3 (used by diskcache) for better performance.

Note: See TracTickets for help on using tickets.
Back to Top