#9400 closed Bug (worksforme)
flock causes problems when writing to an NFS share
Reported by: | mikeh | Owned by: | nobody |
---|---|---|---|
Component: | File uploads/storage | Version: | 1.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Design decision needed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
This seems to be the same behaviour as reported in #8403, but as that ticket has been closed as there was a request not to reopen it, here's a new ticket.
We have a media directory mounted over NFS. Our system is RHEL5.2, Python 2.4, Django-1.0. Saving a file through the standard FileField mechanisms (we're not using any custom storage backends, just out of the box django setup stuff) results in the following :
File "/usr/lib64/python2.4/site-packages/mod_python/apache.py", line 299, in HandlerDispatch? result = object(req) File "/usr/lib/python2.4/site-packages/django/core/handlers/modpython.py", line 222, in handler return ModPythonHandler?()(req) File "/usr/lib/python2.4/site-packages/django/core/handlers/modpython.py", line 195, in call response = self.get_response(request) File "/usr/lib/python2.4/site-packages/django/core/handlers/base.py", line 128, in get_response return self.handle_uncaught_exception(request, resolver, exc_info) File "./../apps/dave_common/init.py", line 20, in new File "/usr/lib/python2.4/site-packages/django/core/handlers/base.py", line 86, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/usr/lib/python2.4/site-packages/django/contrib/admin/sites.py", line 158, in root return self.model_page(request, *url.split('/', 2)) File "/usr/lib/python2.4/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/usr/lib/python2.4/site-packages/django/contrib/admin/sites.py", line 177, in model_page return admin_obj(request, rest_of_url) File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 191, in call return self.add_view(request) File "/usr/lib/python2.4/site-packages/django/db/transaction.py", line 238, in _commit_on_success res = func(*args, **kw) File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 492, in add_view new_object = self.save_form(request, form, change=False) File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 370, in save_form return form.save(commit=False) File "/usr/lib/python2.4/site-packages/django/forms/models.py", line 302, in save return save_instance(self, self.instance, self._meta.fields, fail_message, commit) File "/usr/lib/python2.4/site-packages/django/forms/models.py", line 47, in save_instance f.save_form_data(instance, cleaned_data[f.name]) File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 192, in save_form_data getattr(instance, self.name).save(data.name, data, save=False) File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 217, in save super(ImageFieldFile?, self).save(name, content, save) File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 74, in save self._name = self.storage.save(name, content) File "/usr/lib/python2.4/site-packages/django/core/files/storage.py", line 45, in save name = self._save(name, content) File "/usr/lib/python2.4/site-packages/django/core/files/storage.py", line 159, in _save locks.lock(fd, locks.LOCK_EX) File "/usr/lib/python2.4/site-packages/django/core/files/locks.py", line 57, in lock fcntl.lockf(fd(file), flags) IOError: [Errno 37] No locks available
The default with RHEL5.2 is NFSv3, and that's what we're using.
Cheers,
Mike
Attachments (1)
Change History (13)
follow-up: 2 comment:1 by , 16 years ago
comment:2 by , 16 years ago
Replying to mtredinnick:
So we're in an impossible situation here then.
lockf()
doesn't work everywhere,flock()
doesn't work everywhere. And there's no way to know which one works.
Since
lockf()
-- the way Django currently does things -- is the recommended approach to doing portable locking and it should work with NFS (I made sure and read the Python source before making the change), I'm inclined to leave the current behaviour in place until a more robust solution emerges.
Thus, we'll need more information and investigation from you on this one. For example, does changing the
lockf()
call toflock()
also fail? Do you havestatd
running on the server (so that locking is available -- since that was one of the problems in a Debian case, for example)? What information can you track down about why one version works somewhere and the other version works (if it does) on other NFS servers? What's the differentiating feature?
Sorry to push the research back in your direction, but right now Django's doing the best it can as far as following recommended practices and the current code certainly avoided the problems that were reported earlier. Yours is the first case that's been reported of it not working on a reliable NFS setup with the current code, so you have the (only?) failing test case and will need to work out what's going on. I'm far beyond being able to guess.
#9433 points out a similar problem (although on afp mounts).
The patch it provides (<http://code.djangoproject.com/attachment/ticket/9433/not_supported_locks.diff>) may be adapted to also handle the "IOError: [Errno 37] No locks available
" error.
comment:3 by , 16 years ago
Triage Stage: | Unreviewed → Design decision needed |
---|
comment:4 by , 16 years ago
Component: | Uncategorized → File uploads/storage |
---|
follow-up: 6 comment:5 by , 15 years ago
We are experiencing this same issue on our production environment, which uses NFS. I believe this started once we upgraded to Django 1.1, so we will likely rollback to Django 1.0 to avoid these fatal errors. Is there a possible stop-gap (patch) that could avoid this error without reverting to 1.0? We'll be happy to be a second test case to help design a proper solution to this problem.
comment:6 by , 15 years ago
Replying to worksology:
We are experiencing this same issue on our production environment, which uses NFS. I believe this started once we upgraded to Django 1.1, so we will likely rollback to Django 1.0 to avoid these fatal errors. Is there a possible stop-gap (patch) that could avoid this error without reverting to 1.0? We'll be happy to be a second test case to help design a proper solution to this problem.
There is no stopgap patch since so far as I can see no one with a failing system has answered Malcolm's questions in http://code.djangoproject.com/ticket/9400#comment:1. That comment lays out some stuff you could try, and things you should check (i.e., that locking is in fact available on this filesystem). Without further information from people who actually experience this error there is not much that Django can do to fix it.
comment:7 by , 15 years ago
Some more information for debugging:
Our environment uses a clustered NFS using nfs-utils-1.0.6-93.EL4 and mounting using nfs version 3 with options: rsize=32768,wsize=32768,tcp,nfsvers=3,hard,intr
I've patched our Django install to use flock() and it works again.
comment:8 by , 14 years ago
I was just bitten by this issue (Error 37). However, it was caused by the NFS 3 Client not having a running nfslock service ($ sudo /sbin/service nfslock start)
My system and NFS:
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Linux 2.6.18-164.15.1.el5 #1 SMP Mon Mar 1 10:56:08 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
nfs-utils.x86_64 1:1.0.9-42.el5
As an FYI, NFS 2 and NFS 3 require the third party locking service, whereas NFS 4 has locking built into the protocol.
I'll attach a small script which tests the locking behavior directly, so you can run the script while testing your NFS configuration. It's a cut and paste of the locks behavior as of 1.2.1.
by , 14 years ago
Attachment: | nfslocktest.py added |
---|
Small file to test locking as locks.py does, outside of Django.
comment:9 by , 14 years ago
It appears the root of our problem with lockf() is that one of our machines was not running rpc.statd. Just posting in case this helps anyone else with the NFS file-locking problem.
comment:10 by , 14 years ago
Severity: | → Normal |
---|---|
Type: | → Bug |
comment:11 by , 13 years ago
Easy pickings: | unset |
---|---|
Resolution: | → worksforme |
Status: | new → closed |
UI/UX: | unset |
If you want to use NFS with locks, you need to run statd
.
AFAICT Django is following the recommended best practice.
I'm successfully storing media files on a NFS share in production at $DAY_JOB.
comment:12 by , 8 years ago
After 6 years It does not work for me, too, but for GlusterFS share.
I must set STATIC_ROOT to local path, which is a symlink pointing to to the GlusterFS path.
When the STATIC_ROOT is set directly to GlusterFS share, Django will crash:
Type 'yes' to continue, or 'no' to cancel: yes Deleting 'fonts/FontAwesome.otf' Copying '[...]static/fonts/FontAwesome.otf' Traceback (most recent call last): File "bin/diagnostictool", line 39, in <module> sys.exit(djangorecipe.manage.main('diagnostictool.production')) File "[...]eggs/djangorecipe-1.11-py2.7.egg/djangorecipe/manage.py", line 9, in main management.execute_from_command_line(sys.argv) File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/__init__.py", line 354, in execute_from_command_line utility.execute() File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/__init__.py", line 346, in execute self.fetch_command(subcommand).run_from_argv(self.argv) File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/base.py", line 394, in run_from_argv self.execute(*args, **cmd_options) File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/base.py", line 445, in execute output = self.handle(*args, **options) File "[...]eggs/Django-1.8.6-py2.7.egg/django/contrib/staticfiles/management/commands/collectstatic.py", line 168, in handle collected = self.collect() File "[...]eggs/Django-1.8.6-py2.7.egg/django/contrib/staticfiles/management/commands/collectstatic.py", line 107, in collect handler(path, prefixed_path, storage) File "[...]eggs/Django-1.8.6-py2.7.egg/django/contrib/staticfiles/management/commands/collectstatic.py", line 315, in copy_file self.storage.save(prefixed_path, source_file) File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/files/storage.py", line 63, in save name = self._save(name, content) File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/files/storage.py", line 258, in _save locks.unlock(fd) File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/files/locks.py", line 112, in unlock ret = fcntl.flock(_fd(f), fcntl.LOCK_UN) IOError: [Errno 2] No such file or directory
So we're in an impossible situation here then.
lockf()
doesn't work everywhere,flock()
doesn't work everywhere. And there's no way to know which one works.Since
lockf()
-- the way Django currently does things -- is the recommended approach to doing portable locking and it should work with NFS (I made sure and read the Python source before making the change), I'm inclined to leave the current behaviour in place until a more robust solution emerges.Thus, we'll need more information and investigation from you on this one. For example, does changing the
lockf()
call toflock()
also fail? Do you havestatd
running on the server (so that locking is available -- since that was one of the problems in a Debian case, for example)? What information can you track down about why one version works somewhere and the other version works (if it does) on other NFS servers? What's the differentiating feature?Sorry to push the research back in your direction, but right now Django's doing the best it can as far as following recommended practices and the current code certainly avoided the problems that were reported earlier. Yours is the first case that's been reported of it not working on a reliable NFS setup with the current code, so you have the (only?) failing test case and will need to work out what's going on. I'm far beyond being able to guess.