Opened 5 years ago

Closed 15 months ago

Last modified 15 months ago

#29994 closed Cleanup/optimization (fixed)

Document performance issues in FileBasedCache

Reported by: Mateusz Konieczny Owned by: Sarah Boyce
Component: Documentation Version: 2.1
Severity: Normal Keywords:
Cc: Sarah Boyce Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

https://docs.djangoproject.com/en/2.1/topics/cache/#filesystem-caching is not mentioning any problems with FileBasedCache

According to https://github.com/grantjenks/python-diskcache#diskcache-disk-backed-cache "Unfortunately the file-based cache in Django is essentially broken. The culling method is random and large caches repeatedly scan a cache directory which slows linearly with growth. Can you really allow it to take sixty milliseconds to store a key in a cache with a thousand items?"

From my checking it seems to not be reported so far.

According to https://github.com/grantjenks/python-diskcache/issues/93#issuecomment-442580191 "The deficiencies of FileBasedCache are too well known to require enumeration" but it would be nice to warn also developers unaware of this problems.

Change History (17)

comment:1 by Mateusz Konieczny, 5 years ago

Needs documentation: set

comment:2 by Grant Jenks, 5 years ago

Related, and much older issue: https://code.djangoproject.com/ticket/11260

I agree that the file-based caching docs could be improved to described the limitations. I would also welcome a link to the Disk Cache project (disclaimer: I'm the author.)

For the Django developers, please don't think Disk Cache is a criticism of Django's file-based cache. I love Django! The Disk Cache project is thousands of lines of code with many non-Django users. I don't think the code nor design is appropriate for inclusion in Django.

comment:3 by Tim Graham, 5 years ago

Needs documentation: unset
Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

The Django documentation avoids endorsing third-party packages because of the burden of curation this would present. I think we would need a consensus on the DevelopersMailingList to do that.

comment:4 by Mateusz Konieczny, 5 years ago

From #11260 - ""I'm going to wontfix, on the grounds that the filesystem cache is intended as an easy way to test caching, not as a serious caching strategy. The default cache size and the cull strategy implemented by the file cache should make that obvious.""

Would it be OK to submit patch that for start will add something like "Note that this caching strategy is useful only for development and should not be used in production due to a poor performance" ?

Last edited 5 years ago by Mateusz Konieczny (previous) (diff)

comment:5 by Tim Graham, 5 years ago

I'm not sure. I think ticket:11260#comment:9 is interesting: "I have Django sites of tens of thousands of pages running for over 2 years using the above patches, so your statements about filesystem caching not a serious strategy are irrelevant." The patch for #11260 is very simple and seems worth considering if it makes file-based caching usable in production. Do you have an opinion on that, Grant?

comment:6 by Grant Jenks, 5 years ago

I looked at the patch in #11260 but as far as I can tell it only adds a scenario where when _max_enries is set to 0 or None then no culling ever takes place. I'm not sure how anteater_sa handles cache size in that case. I suppose if you can guarantee that the filesystem cache will never grow beyond a certain size then that's a reasonable strategy. It's more of a persistent dictionary though.

I certainly think filesystem caching has serious use cases. I myself have used diskcache for an ecommerce site for several years with tens of thousands of pages. I started with the built-in file-based cached but later chose sqlite3 (used by diskcache) for better performance.

comment:7 by sheenarbw, 18 months ago

Owner: changed from nobody to sheenarbw
Status: newassigned

comment:8 by sheenarbw, 18 months ago

The _cull function in Filesystem caching does still slow down with a large number of files. I added a warning to the docs.

https://github.com/django/django/pull/16218

comment:9 by Mariusz Felisiak, 18 months ago

Has patch: set
Patch needs improvement: set

comment:10 by Mariusz Felisiak, 18 months ago

Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:11 by Mariusz Felisiak, 18 months ago

Triage Stage: Ready for checkinAccepted

comment:12 by Mariusz Felisiak, 18 months ago

Patch needs improvement: set

comment:13 by Mariusz Felisiak, 15 months ago

Cc: Sarah Boyce added
Owner: sheenarbw removed
Status: assignednew

comment:14 by Sarah Boyce, 15 months ago

Owner: set to Sarah Boyce
Patch needs improvement: unset
Status: newassigned

comment:15 by Mariusz Felisiak, 15 months ago

Triage Stage: AcceptedReady for checkin

comment:16 by GitHub <noreply@…>, 15 months ago

Resolution: fixed
Status: assignedclosed

In 1eb94bc:

Fixed #29994 -- Added warning about performance of FileBasedCache with a large number of files.

Co-authored-by: sheenarbw <699166+sheenarbw@…>
Co-authored-by: Mariusz Felisiak <felisiak.mariusz@…>

comment:17 by Mariusz Felisiak <felisiak.mariusz@…>, 15 months ago

In de42d513:

[4.2.x] Fixed #29994 -- Added warning about performance of FileBasedCache with a large number of files.

Co-authored-by: sheenarbw <699166+sheenarbw@…>
Co-authored-by: Mariusz Felisiak <felisiak.mariusz@…>
Backport of 1eb94bc8dab46dfa117d21ef4f3b52aebb593615 from main

Note: See TracTickets for help on using tickets.
Back to Top