Opened 3 years ago

Closed 3 years ago

Last modified 11 months ago

#20536 closed Bug (fixed)

File based cache is not safe for concurrent processes

Reported by: Jaap Roes Owned by: Jaap Roes
Component: Core (Cache system) Version: master
Severity: Normal Keywords: filebasedcache concurrency unicodeerror
Cc: k@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Django's FileBasedCache is not safe for concurrent processes (as mentioned in the thread Is file based cache is safe for concurrent process? on django-developers)

I've attached a simple python script that demonstrates the issue consistently (on my PC) and also provides an alternative implementation of the set method based on the werkzeug's filesystem cache.

I'm wondering if the approach I used to demonstrate the issue is suitable for use within Django's testsuite and if the tempfile write through approach is an acceptable solution (it makes writing to the cache slightly slower).

P.S. "The file based cache is primarily there as a proof of concept"

This is not well documented and I suspect a fair amount of people are using it in production scenarios.

Attachments (1)

test_filebasedcache.py (2.2 KB) - added by Jaap Roes 3 years ago.

Download all attachments as: .zip

Change History (10)

Changed 3 years ago by Jaap Roes

Attachment: test_filebasedcache.py added

comment:1 Changed 3 years ago by gerdemb

Needs documentation: unset
Needs tests: unset
Patch needs improvement: unset

I've been running something very similar in production for a long time with no problems. Here's my code:

    def _cull(self):
        return

    def set(self, key, value, timeout=None, version=None):
        key = self.make_key(key, version=version)
        self.validate_key(key)

        fname = self._key_to_file(key)
        dirname = os.path.dirname(fname)
        if not os.path.exists(dirname):
            try:
                os.makedirs(dirname)
            except OSError, e:
                if e.errno == errno.EEXIST:
                    pass
                else:
                    raise
        tmpfile = tempfile.NamedTemporaryFile(dir=self._dir, delete=False)

        if timeout is None:
            timeout = self.default_timeout

        self._cull()

        now = time.time()
        pickle.dump(now + timeout, tmpfile, pickle.HIGHEST_PROTOCOL)
        pickle.dump(value, tmpfile, pickle.HIGHEST_PROTOCOL)

        # tmpfile.flush()
        # os.fsync(tmpfile.fileno())
        tmpfile.close()
        os.rename(tmpfile.name, fname)

A couple of notes:

  1. The only exception I catch is when makedirs() fails due to an already existing directory. This can happen when two cache keys that point to the same directory are added at the same time. I have not seen any other exceptions raised in production.
  2. Somewhere I read that to be truly atomic the temporary file must be flushed and the file system fsynced. I have commented out those two lines for better performance and have not had any problems, but I am not an expert here.
  3. I moved the cull() function to a management command that I run periodically with cron. This improves performance, BUT I suspect there could be conflicts when there are cache writes or reads during a cull operation. Since I am only culling once a day, my chances of having a conflict are quite small and I have not seen any errors in production.

comment:2 Changed 3 years ago by Jaap Roes

I've created a pull request that should fix this issue (https://github.com/django/django/pull/1360)

I borrowed some code from django.contrib.sessions.backends.file as that one has already been made safer.

Haven't done anything about _cull: but if anyone is interested there's also ticket #15825 (File-based caching doesn't cull truly randomly)

comment:3 Changed 3 years ago by Jaap Roes

Created a new patch: https://github.com/django/django/pull/1514 also addresses #15825

comment:4 Changed 3 years ago by Kevin Christopher Henry

Cc: k@… added
Has patch: set
Needs tests: set
Patch needs improvement: set
Triage Stage: UnreviewedAccepted

comment:5 Changed 3 years ago by Kevin Christopher Henry

Needs tests: unset
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:6 Changed 3 years ago by Jaap Roes

Owner: changed from nobody to Jaap Roes
Status: newassigned

comment:7 Changed 3 years ago by Anssi Kääriäinen

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

The error handling has changed, and I don't see a mention why it has to change. So, marking patch needs improvement based on that.

In addition I need a confirmation that this is tested on Windows.

Sorry for not dealing with this sooner, we should be doing better with ready-for-checkin patches.

comment:8 Changed 3 years ago by Anssi Kääriäinen <akaariai@…>

Resolution: fixed
Status: assignedclosed

In 7be638390e18fcbfaaed638f9908673360c280d3:

Fixed #20536 -- rewrite of the file based cache backend

  • Safer for use in multiprocess environments
  • Better random culling
  • Cache files use less disk space
  • Safer delete behavior

Also fixed #15806, fixed #15825.

comment:9 Changed 11 months ago by Tim Graham <timograham@…>

In 1aba0e4c:

Refs #25501 -- Fixed a typo in django/core/cache/backends/filebased.py

The original intent in refs #20536 was to use the highest protocol.
Calling zlib.compress() with a compression level of -1 seems to
fall back to the default level of 6.

Note: See TracTickets for help on using tickets.
Back to Top