Code

Opened 5 years ago

Closed 2 years ago

#9400 closed Bug (worksforme)

flock causes problems when writing to an NFS share

Reported by: mikeh Owned by: nobody
Component: File uploads/storage Version: 1.0
Severity: Normal Keywords:
Cc: Triage Stage: Design decision needed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

This seems to be the same behaviour as reported in #8403, but as that ticket has been closed as there was a request not to reopen it, here's a new ticket.

We have a media directory mounted over NFS. Our system is RHEL5.2, Python 2.4, Django-1.0. Saving a file through the standard FileField mechanisms (we're not using any custom storage backends, just out of the box django setup stuff) results in the following :

File "/usr/lib64/python2.4/site-packages/mod_python/apache.py", line 299, in HandlerDispatch?

result = object(req)

File "/usr/lib/python2.4/site-packages/django/core/handlers/modpython.py", line 222, in handler

return ModPythonHandler?()(req)

File "/usr/lib/python2.4/site-packages/django/core/handlers/modpython.py", line 195, in call

response = self.get_response(request)

File "/usr/lib/python2.4/site-packages/django/core/handlers/base.py", line 128, in get_response

return self.handle_uncaught_exception(request, resolver, exc_info)

File "./../apps/dave_common/init.py", line 20, in new 
File "/usr/lib/python2.4/site-packages/django/core/handlers/base.py", line 86, in get_response

response = callback(request, *callback_args, **callback_kwargs)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/sites.py", line 158, in root

return self.model_page(request, *url.split('/', 2))

File "/usr/lib/python2.4/site-packages/django/views/decorators/cache.py", line 44, in _wrapped_view_func

response = view_func(request, *args, **kwargs)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/sites.py", line 177, in model_page

return admin_obj(request, rest_of_url)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 191, in call

return self.add_view(request)

File "/usr/lib/python2.4/site-packages/django/db/transaction.py", line 238, in _commit_on_success

res = func(*args, **kw)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 492, in add_view

new_object = self.save_form(request, form, change=False)

File "/usr/lib/python2.4/site-packages/django/contrib/admin/options.py", line 370, in save_form

return form.save(commit=False)

File "/usr/lib/python2.4/site-packages/django/forms/models.py", line 302, in save

return save_instance(self, self.instance, self._meta.fields, fail_message, commit)

File "/usr/lib/python2.4/site-packages/django/forms/models.py", line 47, in save_instance

f.save_form_data(instance, cleaned_data[f.name])

File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 192, in save_form_data

getattr(instance, self.name).save(data.name, data, save=False)

File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 217, in save

super(ImageFieldFile?, self).save(name, content, save)

File "/usr/lib/python2.4/site-packages/django/db/models/fields/files.py", line 74, in save

self._name = self.storage.save(name, content)

File "/usr/lib/python2.4/site-packages/django/core/files/storage.py", line 45, in save

name = self._save(name, content)

File "/usr/lib/python2.4/site-packages/django/core/files/storage.py", line 159, in _save

locks.lock(fd, locks.LOCK_EX)

File "/usr/lib/python2.4/site-packages/django/core/files/locks.py", line 57, in lock

fcntl.lockf(fd(file), flags)

IOError: [Errno 37] No locks available

The default with RHEL5.2 is NFSv3, and that's what we're using.

Cheers,

Mike

Attachments (1)

nfslocktest.py (387 bytes) - added by dougvanhorn 4 years ago.
Small file to test locking as locks.py does, outside of Django.

Download all attachments as: .zip

Change History (12)

comment:1 follow-up: Changed 5 years ago by mtredinnick

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

So we're in an impossible situation here then. lockf() doesn't work everywhere, flock() doesn't work everywhere. And there's no way to know which one works.

Since lockf() -- the way Django currently does things -- is the recommended approach to doing portable locking and it should work with NFS (I made sure and read the Python source before making the change), I'm inclined to leave the current behaviour in place until a more robust solution emerges.

Thus, we'll need more information and investigation from you on this one. For example, does changing the lockf() call to flock() also fail? Do you have statd running on the server (so that locking is available -- since that was one of the problems in a Debian case, for example)? What information can you track down about why one version works somewhere and the other version works (if it does) on other NFS servers? What's the differentiating feature?

Sorry to push the research back in your direction, but right now Django's doing the best it can as far as following recommended practices and the current code certainly avoided the problems that were reported earlier. Yours is the first case that's been reported of it not working on a reliable NFS setup with the current code, so you have the (only?) failing test case and will need to work out what's going on. I'm far beyond being able to guess.

comment:2 in reply to: ↑ 1 Changed 5 years ago by rndblnch

Replying to mtredinnick:

So we're in an impossible situation here then. lockf() doesn't work everywhere, flock() doesn't work everywhere. And there's no way to know which one works.

Since lockf() -- the way Django currently does things -- is the recommended approach to doing portable locking and it should work with NFS (I made sure and read the Python source before making the change), I'm inclined to leave the current behaviour in place until a more robust solution emerges.

Thus, we'll need more information and investigation from you on this one. For example, does changing the lockf() call to flock() also fail? Do you have statd running on the server (so that locking is available -- since that was one of the problems in a Debian case, for example)? What information can you track down about why one version works somewhere and the other version works (if it does) on other NFS servers? What's the differentiating feature?

Sorry to push the research back in your direction, but right now Django's doing the best it can as far as following recommended practices and the current code certainly avoided the problems that were reported earlier. Yours is the first case that's been reported of it not working on a reliable NFS setup with the current code, so you have the (only?) failing test case and will need to work out what's going on. I'm far beyond being able to guess.

#9433 points out a similar problem (although on afp mounts).
The patch it provides (<http://code.djangoproject.com/attachment/ticket/9433/not_supported_locks.diff>) may be adapted to also handle the "IOError: [Errno 37] No locks available" error.

comment:3 Changed 5 years ago by jacob

  • Triage Stage changed from Unreviewed to Design decision needed

comment:4 Changed 5 years ago by thejaswi_puthraya

  • Component changed from Uncategorized to File uploads/storage

comment:5 follow-up: Changed 4 years ago by worksology

We are experiencing this same issue on our production environment, which uses NFS. I believe this started once we upgraded to Django 1.1, so we will likely rollback to Django 1.0 to avoid these fatal errors. Is there a possible stop-gap (patch) that could avoid this error without reverting to 1.0? We'll be happy to be a second test case to help design a proper solution to this problem.

comment:6 in reply to: ↑ 5 Changed 4 years ago by kmtracey

Replying to worksology:

We are experiencing this same issue on our production environment, which uses NFS. I believe this started once we upgraded to Django 1.1, so we will likely rollback to Django 1.0 to avoid these fatal errors. Is there a possible stop-gap (patch) that could avoid this error without reverting to 1.0? We'll be happy to be a second test case to help design a proper solution to this problem.

There is no stopgap patch since so far as I can see no one with a failing system has answered Malcolm's questions in http://code.djangoproject.com/ticket/9400#comment:1. That comment lays out some stuff you could try, and things you should check (i.e., that locking is in fact available on this filesystem). Without further information from people who actually experience this error there is not much that Django can do to fix it.

comment:7 Changed 4 years ago by worksology

Some more information for debugging:

Our environment uses a clustered NFS using nfs-utils-1.0.6-93.EL4 and mounting using nfs version 3 with options: rsize=32768,wsize=32768,tcp,nfsvers=3,hard,intr

I've patched our Django install to use flock() and it works again.

comment:8 Changed 4 years ago by dougvanhorn

I was just bitten by this issue (Error 37). However, it was caused by the NFS 3 Client not having a running nfslock service ($ sudo /sbin/service nfslock start)

My system and NFS:

Red Hat Enterprise Linux Server release 5.4 (Tikanga)
Linux 2.6.18-164.15.1.el5 #1 SMP Mon Mar 1 10:56:08 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
nfs-utils.x86_64 1:1.0.9-42.el5

As an FYI, NFS 2 and NFS 3 require the third party locking service, whereas NFS 4 has locking built into the protocol.

I'll attach a small script which tests the locking behavior directly, so you can run the script while testing your NFS configuration. It's a cut and paste of the locks behavior as of 1.2.1.

Changed 4 years ago by dougvanhorn

Small file to test locking as locks.py does, outside of Django.

comment:9 Changed 3 years ago by worksology

It appears the root of our problem with lockf() is that one of our machines was not running rpc.statd. Just posting in case this helps anyone else with the NFS file-locking problem.

comment:10 Changed 3 years ago by lukeplant

  • Severity set to Normal
  • Type set to Bug

comment:11 Changed 2 years ago by aaugustin

  • Easy pickings unset
  • Resolution set to worksforme
  • Status changed from new to closed
  • UI/UX unset

If you want to use NFS with locks, you need to run statd.

AFAICT Django is following the recommended best practice.

I'm successfully storing media files on a NFS share in production at $DAY_JOB.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.