Opened 16 years ago

Closed 13 years ago

Last modified 8 years ago

#9400 closed Bug (worksforme)

flock causes problems when writing to an NFS share

Reported by: mikeh Owned by: nobody
Component: File uploads/storage Version: 1.0
Severity: Normal Keywords:
Cc: Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

This seems to be the same behaviour as reported in #8403, but as that ticket has been closed as there was a request not to reopen it, here's a new ticket.

We have a media directory mounted over NFS. Our system is RHEL5.2, Python 2.4, Django-1.0. Saving a file through the standard FileField mechanisms (we're not using any custom storage backends, just out of the box django setup stuff) results in the following :

File "/usr/lib64/python2.4/site-packages/mod_python/apache.py", line 299, in HandlerDispatch?

result = object(req)

File "/usr/lib/python2.4/site-packages/django/core/handlers/modpython.py", line 222, in handler

return ModPythonHandler?()(req)

File "/usr/lib/python2.4/site-packages/django/core/handlers/modpython.py", line 195, in call

response = self.get_response(request)

File "/usr/lib/python2.4/site-packages/django/core/handlers/base.py", line 128, in get_response

return self.handle_uncaught_exception(request, resolver, exc_info)

File "./../apps/dave_common/init.py", line 20, in new 
File "/usr/lib/python2.4/site-packages/django/core/handlers/base.py", line 86, in get_response

response = callback(request, *callback_args, **callback_kwargs)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/sites.py", line 158, in root

return self.model_page(request, *url.split('/', 2))

File "/usr/lib/python2.4/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func

response = view_func(request, *args, **kwargs)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/sites.py", line 177, in model_page

return admin_obj(request, rest_of_url)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 191, in call

return self.add_view(request)

File "/usr/lib/python2.4/site-packages/django/db/transaction.py", line 238, in _commit_on_success

res = func(*args, **kw)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 492, in add_view

new_object = self.save_form(request, form, change=False)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 370, in save_form

return form.save(commit=False)

File "/usr/lib/python2.4/site-packages/django/forms/models.py", line 302, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/lib/python2.4/site-packages/django/forms/models.py", line 47, in save_instance

f.save_form_data(instance, cleaned_data[f.name])

File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 192, in save_form_data

getattr(instance, self.name).save(data.name, data, save=False)

File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 217, in save

super(ImageFieldFile?, self).save(name, content, save)

File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 74, in save

self._name = self.storage.save(name, content)

File "/usr/lib/python2.4/site-packages/django/core/files/storage.py", line 45, in save

name = self._save(name, content)

File "/usr/lib/python2.4/site-packages/django/core/files/storage.py", line 159, in _save

locks.lock(fd, locks.LOCK_EX)

File "/usr/lib/python2.4/site-packages/django/core/files/locks.py", line 57, in lock

fcntl.lockf(fd(file), flags)

IOError: [Errno 37] No locks available

The default with RHEL5.2 is NFSv3, and that's what we're using.

Cheers,

Mike

Attachments (1)

nfslocktest.py (387 bytes ) - added by dougvanhorn 14 years ago.
Small file to test locking as locks.py does, outside of Django.

Download all attachments as: .zip

Change History (13)

comment:1 by Malcolm Tredinnick, 16 years ago

So we're in an impossible situation here then. lockf() doesn't work everywhere, flock() doesn't work everywhere. And there's no way to know which one works.

Since lockf() -- the way Django currently does things -- is the recommended approach to doing portable locking and it should work with NFS (I made sure and read the Python source before making the change), I'm inclined to leave the current behaviour in place until a more robust solution emerges.

Thus, we'll need more information and investigation from you on this one. For example, does changing the lockf() call to flock() also fail? Do you have statd running on the server (so that locking is available -- since that was one of the problems in a Debian case, for example)? What information can you track down about why one version works somewhere and the other version works (if it does) on other NFS servers? What's the differentiating feature?

Sorry to push the research back in your direction, but right now Django's doing the best it can as far as following recommended practices and the current code certainly avoided the problems that were reported earlier. Yours is the first case that's been reported of it not working on a reliable NFS setup with the current code, so you have the (only?) failing test case and will need to work out what's going on. I'm far beyond being able to guess.

in reply to:  1 comment:2 by rndblnch, 16 years ago

Replying to mtredinnick:

So we're in an impossible situation here then. lockf() doesn't work everywhere, flock() doesn't work everywhere. And there's no way to know which one works.

Since lockf() -- the way Django currently does things -- is the recommended approach to doing portable locking and it should work with NFS (I made sure and read the Python source before making the change), I'm inclined to leave the current behaviour in place until a more robust solution emerges.

Thus, we'll need more information and investigation from you on this one. For example, does changing the lockf() call to flock() also fail? Do you have statd running on the server (so that locking is available -- since that was one of the problems in a Debian case, for example)? What information can you track down about why one version works somewhere and the other version works (if it does) on other NFS servers? What's the differentiating feature?

Sorry to push the research back in your direction, but right now Django's doing the best it can as far as following recommended practices and the current code certainly avoided the problems that were reported earlier. Yours is the first case that's been reported of it not working on a reliable NFS setup with the current code, so you have the (only?) failing test case and will need to work out what's going on. I'm far beyond being able to guess.

#9433 points out a similar problem (although on afp mounts).
The patch it provides (<http://code.djangoproject.com/attachment/ticket/9433/not_supported_locks.diff>) may be adapted to also handle the "IOError: [Errno 37] No locks available" error.

comment:3 by Jacob, 16 years ago

Triage Stage: UnreviewedDesign decision needed

comment:4 by Thejaswi Puthraya, 16 years ago

Component: UncategorizedFile uploads/storage

comment:5 by worksology, 15 years ago

We are experiencing this same issue on our production environment, which uses NFS. I believe this started once we upgraded to Django 1.1, so we will likely rollback to Django 1.0 to avoid these fatal errors. Is there a possible stop-gap (patch) that could avoid this error without reverting to 1.0? We'll be happy to be a second test case to help design a proper solution to this problem.

in reply to:  5 comment:6 by Karen Tracey, 15 years ago

Replying to worksology:

We are experiencing this same issue on our production environment, which uses NFS. I believe this started once we upgraded to Django 1.1, so we will likely rollback to Django 1.0 to avoid these fatal errors. Is there a possible stop-gap (patch) that could avoid this error without reverting to 1.0? We'll be happy to be a second test case to help design a proper solution to this problem.

There is no stopgap patch since so far as I can see no one with a failing system has answered Malcolm's questions in http://code.djangoproject.com/ticket/9400#comment:1. That comment lays out some stuff you could try, and things you should check (i.e., that locking is in fact available on this filesystem). Without further information from people who actually experience this error there is not much that Django can do to fix it.

comment:7 by worksology, 15 years ago

Some more information for debugging:

Our environment uses a clustered NFS using nfs-utils-1.0.6-93.EL4 and mounting using nfs version 3 with options: rsize=32768,wsize=32768,tcp,nfsvers=3,hard,intr

I've patched our Django install to use flock() and it works again.

comment:8 by dougvanhorn, 14 years ago

I was just bitten by this issue (Error 37). However, it was caused by the NFS 3 Client not having a running nfslock service ($ sudo /sbin/service nfslock start)

My system and NFS:

Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Linux 2.6.18-164.15.1.el5 #1 SMP Mon Mar 1 10:56:08 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
nfs-utils.x86_64 1:1.0.9-42.el5

As an FYI, NFS 2 and NFS 3 require the third party locking service, whereas NFS 4 has locking built into the protocol.

I'll attach a small script which tests the locking behavior directly, so you can run the script while testing your NFS configuration. It's a cut and paste of the locks behavior as of 1.2.1.

by dougvanhorn, 14 years ago

Attachment: nfslocktest.py added

Small file to test locking as locks.py does, outside of Django.

comment:9 by worksology, 14 years ago

It appears the root of our problem with lockf() is that one of our machines was not running rpc.statd. Just posting in case this helps anyone else with the NFS file-locking problem.

comment:10 by Luke Plant, 14 years ago

Severity: Normal
Type: Bug

comment:11 by Aymeric Augustin, 13 years ago

Easy pickings: unset
Resolution: worksforme
Status: newclosed
UI/UX: unset

If you want to use NFS with locks, you need to run statd.

AFAICT Django is following the recommended best practice.

I'm successfully storing media files on a NFS share in production at $DAY_JOB.

comment:12 by Marcin Nowak, 8 years ago

After 6 years It does not work for me, too, but for GlusterFS share.
I must set STATIC_ROOT to local path, which is a symlink pointing to to the GlusterFS path.

When the STATIC_ROOT is set directly to GlusterFS share, Django will crash:

Type 'yes' to continue, or 'no' to cancel: yes
Deleting 'fonts/FontAwesome.otf'
Copying '[...]static/fonts/FontAwesome.otf'
Traceback (most recent call last):
  File "bin/diagnostictool", line 39, in <module>
    sys.exit(djangorecipe.manage.main('diagnostictool.production'))
  File "[...]eggs/djangorecipe-1.11-py2.7.egg/djangorecipe/manage.py", line 9, in main
    management.execute_from_command_line(sys.argv)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/__init__.py", line 354, in execute_from_command_line
    utility.execute()
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/__init__.py", line 346, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/base.py", line 394, in run_from_argv
    self.execute(*args, **cmd_options)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/management/base.py", line 445, in execute
    output = self.handle(*args, **options)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/contrib/staticfiles/management/commands/collectstatic.py", line 168, in handle
    collected = self.collect()
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/contrib/staticfiles/management/commands/collectstatic.py", line 107, in collect
    handler(path, prefixed_path, storage)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/contrib/staticfiles/management/commands/collectstatic.py", line 315, in copy_file
    self.storage.save(prefixed_path, source_file)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/files/storage.py", line 63, in save
    name = self._save(name, content)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/files/storage.py", line 258, in _save
    locks.unlock(fd)
  File "[...]eggs/Django-1.8.6-py2.7.egg/django/core/files/locks.py", line 112, in unlock
    ret = fcntl.flock(_fd(f), fcntl.LOCK_UN)
IOError: [Errno 2] No such file or directory
Note: See TracTickets for help on using tickets.
Back to Top