Opened 11 years ago

Closed 11 years ago

Last modified 9 years ago

#20536 closed Bug (fixed)

File based cache is not safe for concurrent processes

Reported by: Jaap Roes Owned by: Jaap Roes
Component: Core (Cache system) Version: dev
Severity: Normal Keywords: filebasedcache concurrency unicodeerror
Cc: k@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Django's FileBasedCache is not safe for concurrent processes (as mentioned in the thread Is file based cache is safe for concurrent process? on django-developers)

I've attached a simple python script that demonstrates the issue consistently (on my PC) and also provides an alternative implementation of the set method based on the werkzeug's filesystem cache.

I'm wondering if the approach I used to demonstrate the issue is suitable for use within Django's testsuite and if the tempfile write through approach is an acceptable solution (it makes writing to the cache slightly slower).

P.S. "The file based cache is primarily there as a proof of concept"

This is not well documented and I suspect a fair amount of people are using it in production scenarios.

Attachments (1)

test_filebasedcache.py (2.2 KB ) - added by Jaap Roes 11 years ago.

Download all attachments as: .zip

Change History (10)

by Jaap Roes, 11 years ago

Attachment: test_filebasedcache.py added

comment:1 by gerdemb, 11 years ago

I've been running something very similar in production for a long time with no problems. Here's my code:

    def _cull(self):
        return

    def set(self, key, value, timeout=None, version=None):
        key = self.make_key(key, version=version)
        self.validate_key(key)

        fname = self._key_to_file(key)
        dirname = os.path.dirname(fname)
        if not os.path.exists(dirname):
            try:
                os.makedirs(dirname)
            except OSError, e:
                if e.errno == errno.EEXIST:
                    pass
                else:
                    raise
        tmpfile = tempfile.NamedTemporaryFile(dir=self._dir, delete=False)

        if timeout is None:
            timeout = self.default_timeout

        self._cull()

        now = time.time()
        pickle.dump(now + timeout, tmpfile, pickle.HIGHEST_PROTOCOL)
        pickle.dump(value, tmpfile, pickle.HIGHEST_PROTOCOL)

        # tmpfile.flush()
        # os.fsync(tmpfile.fileno())
        tmpfile.close()
        os.rename(tmpfile.name, fname)

A couple of notes:

  1. The only exception I catch is when makedirs() fails due to an already existing directory. This can happen when two cache keys that point to the same directory are added at the same time. I have not seen any other exceptions raised in production.
  2. Somewhere I read that to be truly atomic the temporary file must be flushed and the file system fsynced. I have commented out those two lines for better performance and have not had any problems, but I am not an expert here.
  3. I moved the cull() function to a management command that I run periodically with cron. This improves performance, BUT I suspect there could be conflicts when there are cache writes or reads during a cull operation. Since I am only culling once a day, my chances of having a conflict are quite small and I have not seen any errors in production.

comment:2 by Jaap Roes, 11 years ago

I've created a pull request that should fix this issue (https://github.com/django/django/pull/1360)

I borrowed some code from django.contrib.sessions.backends.file as that one has already been made safer.

Haven't done anything about _cull: but if anyone is interested there's also ticket #15825 (File-based caching doesn't cull truly randomly)

comment:3 by Jaap Roes, 11 years ago

Created a new patch: https://github.com/django/django/pull/1514 also addresses #15825

comment:4 by Kevin Christopher Henry, 11 years ago

Cc: k@… added
Has patch: set
Needs tests: set
Patch needs improvement: set
Triage Stage: UnreviewedAccepted

comment:5 by Kevin Christopher Henry, 11 years ago

Needs tests: unset
Patch needs improvement: unset
Triage Stage: AcceptedReady for checkin

comment:6 by Jaap Roes, 11 years ago

Owner: changed from nobody to Jaap Roes
Status: newassigned

comment:7 by Anssi Kääriäinen, 11 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

The error handling has changed, and I don't see a mention why it has to change. So, marking patch needs improvement based on that.

In addition I need a confirmation that this is tested on Windows.

Sorry for not dealing with this sooner, we should be doing better with ready-for-checkin patches.

comment:8 by Anssi Kääriäinen <akaariai@…>, 11 years ago

Resolution: fixed
Status: assignedclosed

In 7be638390e18fcbfaaed638f9908673360c280d3:

Fixed #20536 -- rewrite of the file based cache backend

  • Safer for use in multiprocess environments
  • Better random culling
  • Cache files use less disk space
  • Safer delete behavior

Also fixed #15806, fixed #15825.

comment:9 by Tim Graham <timograham@…>, 9 years ago

In 1aba0e4c:

Refs #25501 -- Fixed a typo in django/core/cache/backends/filebased.py

The original intent in refs #20536 was to use the highest protocol.
Calling zlib.compress() with a compression level of -1 seems to
fall back to the default level of 6.

Note: See TracTickets for help on using tickets.
Back to Top