Code

Opened 14 months ago

Closed 8 months ago

#20536 closed Bug (fixed)

File based cache is not safe for concurrent processes

Reported by: jaap3 Owned by: jaap3
Component: Core (Cache system) Version: master
Severity: Normal Keywords: filebasedcache concurrency unicodeerror
Cc: k@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: yes
Easy pickings: no UI/UX: no

Description

Django's FileBasedCache is not safe for concurrent processes (as mentioned in the thread Is file based cache is safe for concurrent process? on django-developers)

I've attached a simple python script that demonstrates the issue consistently (on my PC) and also provides an alternative implementation of the set method based on the werkzeug's filesystem cache.

I'm wondering if the approach I used to demonstrate the issue is suitable for use within Django's testsuite and if the tempfile write through approach is an acceptable solution (it makes writing to the cache slightly slower).

P.S. "The file based cache is primarily there as a proof of concept"

This is not well documented and I suspect a fair amount of people are using it in production scenarios.

Attachments (1)

test_filebasedcache.py (2.2 KB) - added by jaap3 14 months ago.

Download all attachments as: .zip

Change History (9)

Changed 14 months ago by jaap3

comment:1 Changed 13 months ago by gerdemb

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

I've been running something very similar in production for a long time with no problems. Here's my code:

    def _cull(self):
        return

    def set(self, key, value, timeout=None, version=None):
        key = self.make_key(key, version=version)
        self.validate_key(key)

        fname = self._key_to_file(key)
        dirname = os.path.dirname(fname)
        if not os.path.exists(dirname):
            try:
                os.makedirs(dirname)
            except OSError, e:
                if e.errno == errno.EEXIST:
                    pass
                else:
                    raise
        tmpfile = tempfile.NamedTemporaryFile(dir=self._dir, delete=False)

        if timeout is None:
            timeout = self.default_timeout

        self._cull()

        now = time.time()
        pickle.dump(now + timeout, tmpfile, pickle.HIGHEST_PROTOCOL)
        pickle.dump(value, tmpfile, pickle.HIGHEST_PROTOCOL)

        # tmpfile.flush()
        # os.fsync(tmpfile.fileno())
        tmpfile.close()
        os.rename(tmpfile.name, fname)

A couple of notes:

  1. The only exception I catch is when makedirs() fails due to an already existing directory. This can happen when two cache keys that point to the same directory are added at the same time. I have not seen any other exceptions raised in production.
  2. Somewhere I read that to be truly atomic the temporary file must be flushed and the file system fsynced. I have commented out those two lines for better performance and have not had any problems, but I am not an expert here.
  3. I moved the cull() function to a management command that I run periodically with cron. This improves performance, BUT I suspect there could be conflicts when there are cache writes or reads during a cull operation. Since I am only culling once a day, my chances of having a conflict are quite small and I have not seen any errors in production.

comment:2 Changed 12 months ago by jaap3

I've created a pull request that should fix this issue (https://github.com/django/django/pull/1360)

I borrowed some code from django.contrib.sessions.backends.file as that one has already been made safer.

Haven't done anything about _cull: but if anyone is interested there's also ticket #15825 (File-based caching doesn't cull truly randomly)

comment:3 Changed 11 months ago by jaap3

Created a new patch: https://github.com/django/django/pull/1514 also addresses #15825

comment:4 Changed 10 months ago by marfire

  • Cc k@… added
  • Has patch set
  • Needs tests set
  • Patch needs improvement set
  • Triage Stage changed from Unreviewed to Accepted

comment:5 Changed 10 months ago by marfire

  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Accepted to Ready for checkin

comment:6 Changed 10 months ago by jaap3

  • Owner changed from nobody to jaap3
  • Status changed from new to assigned

comment:7 Changed 8 months ago by akaariai

  • Patch needs improvement set
  • Triage Stage changed from Ready for checkin to Accepted

The error handling has changed, and I don't see a mention why it has to change. So, marking patch needs improvement based on that.

In addition I need a confirmation that this is tested on Windows.

Sorry for not dealing with this sooner, we should be doing better with ready-for-checkin patches.

comment:8 Changed 8 months ago by Anssi Kääriäinen <akaariai@…>

  • Resolution set to fixed
  • Status changed from assigned to closed

In 7be638390e18fcbfaaed638f9908673360c280d3:

Fixed #20536 -- rewrite of the file based cache backend

  • Safer for use in multiprocess environments
  • Better random culling
  • Cache files use less disk space
  • Safer delete behavior

Also fixed #15806, fixed #15825.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.