Opened 3 years ago

Closed 3 years ago

Last modified 10 months ago

#20536 closed Bug (fixed)

File based cache is not safe for concurrent processes

Reported by: jaap3 Owned by: jaap3
Component: Core (Cache system) Version: master
Severity: Normal Keywords: filebasedcache concurrency unicodeerror
Cc: k@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no


Django's FileBasedCache is not safe for concurrent processes (as mentioned in the thread Is file based cache is safe for concurrent process? on django-developers)

I've attached a simple python script that demonstrates the issue consistently (on my PC) and also provides an alternative implementation of the set method based on the werkzeug's filesystem cache.

I'm wondering if the approach I used to demonstrate the issue is suitable for use within Django's testsuite and if the tempfile write through approach is an acceptable solution (it makes writing to the cache slightly slower).

P.S. "The file based cache is primarily there as a proof of concept"

This is not well documented and I suspect a fair amount of people are using it in production scenarios.

Attachments (1) (2.2 KB) - added by jaap3 3 years ago.

Download all attachments as: .zip

Change History (10)

Changed 3 years ago by jaap3

comment:1 Changed 3 years ago by gerdemb

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I've been running something very similar in production for a long time with no problems. Here's my code:

    def _cull(self):

    def set(self, key, value, timeout=None, version=None):
        key = self.make_key(key, version=version)

        fname = self._key_to_file(key)
        dirname = os.path.dirname(fname)
        if not os.path.exists(dirname):
            except OSError, e:
                if e.errno == errno.EEXIST:
        tmpfile = tempfile.NamedTemporaryFile(dir=self._dir, delete=False)

        if timeout is None:
            timeout = self.default_timeout


        now = time.time()
        pickle.dump(now + timeout, tmpfile, pickle.HIGHEST_PROTOCOL)
        pickle.dump(value, tmpfile, pickle.HIGHEST_PROTOCOL)

        # tmpfile.flush()
        # os.fsync(tmpfile.fileno())
        os.rename(, fname)

A couple of notes:

  1. The only exception I catch is when makedirs() fails due to an already existing directory. This can happen when two cache keys that point to the same directory are added at the same time. I have not seen any other exceptions raised in production.
  2. Somewhere I read that to be truly atomic the temporary file must be flushed and the file system fsynced. I have commented out those two lines for better performance and have not had any problems, but I am not an expert here.
  3. I moved the cull() function to a management command that I run periodically with cron. This improves performance, BUT I suspect there could be conflicts when there are cache writes or reads during a cull operation. Since I am only culling once a day, my chances of having a conflict are quite small and I have not seen any errors in production.

comment:2 Changed 3 years ago by jaap3

I've created a pull request that should fix this issue (

I borrowed some code from django.contrib.sessions.backends.file as that one has already been made safer.

Haven't done anything about _cull: but if anyone is interested there's also ticket #15825 (File-based caching doesn't cull truly randomly)

comment:3 Changed 3 years ago by jaap3

Created a new patch: also addresses #15825

comment:4 Changed 3 years ago by marfire

  • Cc k@… added
  • Has patch set
  • Needs tests set
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

comment:5 Changed 3 years ago by marfire

  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

comment:6 Changed 3 years ago by jaap3

  • Owner changed from nobody to jaap3
  • Status changed from new to assigned

comment:7 Changed 3 years ago by akaariai

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

The error handling has changed, and I don't see a mention why it has to change. So, marking patch needs improvement based on that.

In addition I need a confirmation that this is tested on Windows.

Sorry for not dealing with this sooner, we should be doing better with ready-for-checkin patches.

comment:8 Changed 3 years ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 7be638390e18fcbfaaed638f9908673360c280d3:

Fixed #20536 -- rewrite of the file based cache backend

  • Safer for use in multiprocess environments
  • Better random culling
  • Cache files use less disk space
  • Safer delete behavior

Also fixed #15806, fixed #15825.

comment:9 Changed 10 months ago by Tim Graham <timograham@…>

In 1aba0e4c:

Refs #25501 -- Fixed a typo in django/core/cache/backends/

The original intent in refs #20536 was to use the highest protocol.
Calling zlib.compress() with a compression level of -1 seems to
fall back to the default level of 6.

Note: See TracTickets for help on using tickets.
Back to Top