View previous topic :: View next topic |
Author |
Message |
TJNII l33t
Joined: 09 Nov 2003 Posts: 648 Location: for(;;);
|
Posted: Thu Aug 23, 2007 4:49 am Post subject: [Solved] NFS server goes down like clockwork, every 15 mins |
|
|
Tonight I made the mistake of trying to upgrade my system. Now NFS (1.0.12) doesn't work right. The server will run for 15 minutes (exactly. I timed it.) and then go belly up. Mounts on client machines returr the "Stale NFS file handle" error. Restarting the NFS server corrects this, with no action needed on the client.
I haven't changed anything from my old version (1.0.4, I think. Stable version from early this year), other than update nfs-utils. I've tried no_subtree_check in my exports file and -o rsize=8192,wsize=8192,tcp,hard,intr on the clients. No_subtree_check made it last a little longer, client options didn't have any effect (other than making the client apps wig out less).
I don't see anything usefull in the logs, though I'm not sure exactly which logfile NFS dumps to. /usr/sbin/rpc.nfsd doesn't have a verbose option.
Help?
Last edited by TJNII on Fri Aug 24, 2007 8:11 pm; edited 1 time in total |
|
Back to top |
|
|
turtles Veteran
Joined: 31 Dec 2004 Posts: 1697
|
Posted: Thu Aug 23, 2007 6:14 pm Post subject: |
|
|
I am no expert on this but have you recompiled your kernel? IF yes > Can you revert you your old kernel version?
EDIT: Post Code: | grep 'NFS' /usr/src/linux/.config |
EDIT #2
Like I said I dont use NFS but I get Code: | dep -l nfs-utils
sed: read error on /packages: Is a directory
net-fs/nfs-utils-1.0.12:
!nonfsv4? >=dev-libs/libevent-1.0b dev-libs/libevent-1.3a
!nonfsv4? >=net-libs/libnfsidmap-0.16 net-libs/libnfsidmap-0.17
>=net-nds/portmap-5b-r6 net-nds/portmap-5b-r9
tcpd? sys-apps/tcp-wrappers sys-apps/tcp-wrappers-7.6-r8
|
May try a Code: | emerge -1ua dev-libs/libevent net-libs/libnfsidmap net-nds/portmap sys-apps/tcp-wrappers sys-apps/tcp-wrappers |
|
|
Back to top |
|
|
embobo Guru
Joined: 19 May 2003 Posts: 311
|
Posted: Thu Aug 23, 2007 6:32 pm Post subject: |
|
|
I've had similar problems. I traced the problem to kernel 2.6.22. Until today, I've had to use 2.6.21. Now I'm trying 2.6.22-gentoo-r4.
From the client try
Code: |
rpcinfo -t <server> nlockmgr
rpcinfo -u <server> nlockmgr
|
In my case it wouldn't respond.
Edit: Just found out 2.6.22-gentoo-r4 has the same problem. Back to 2.6.21 for me. |
|
Back to top |
|
|
turtles Veteran
Joined: 31 Dec 2004 Posts: 1697
|
|
Back to top |
|
|
embobo Guru
Joined: 19 May 2003 Posts: 311
|
Posted: Thu Aug 23, 2007 6:47 pm Post subject: |
|
|
You call that a workaround? Lol! |
|
Back to top |
|
|
TJNII l33t
Joined: 09 Nov 2003 Posts: 648 Location: for(;;);
|
Posted: Thu Aug 23, 2007 11:44 pm Post subject: |
|
|
I haven't touched the kernel in months. Unless there is some quirk in 1.0.12 and the 2.6.16 kernels I doubt that's the problem.
Everything else works OK on the system, and I can get the fault time to vary with different options, so I don't think its failing hardware. (Though I've been wrong before.) Only nfs seems to have a problem, unlike that other linked thread. If it is some wierd combination issue it is going to be stupid hard to find.
I've turned off all the use flags and I have conservative compiler flags. I don't know what else to try.
Code: | [ebuild R ] net-fs/nfs-utils-1.0.12 USE="-kerberos -nonfsv4 -tcpd" 0 kB
|
|
|
Back to top |
|
|
TJNII l33t
Joined: 09 Nov 2003 Posts: 648 Location: for(;;);
|
Posted: Fri Aug 24, 2007 7:51 pm Post subject: |
|
|
Last night I did the 15 minute restart and the client didn't start working again. It had done this before, but infrequently. So instead of /etc/init.d/nfs restart, which I had been doing, I did /etc/init.d/nfs stop. While the restarts had been reporting success, this reported fail. a ps -ec revealed a couple rpc processes that wouldn't stop, including rpc.gssd. Thus, rpc.gssd had been running since I compiled out Kerberos a few days and many nfs restarts before. I killed all the rpc processes still running, restarted portmap for good measure, and started nfs again. Now it seems to be working.
I have a feeling, though I can't prove it now, that the root cause was Kerberos. I had a Kerberos use flag set from when I fiddled with it a long time ago, but I never set up Kerberos properly. Until this update I don't think nfs tried to use Kerberos, so I never had a problem. |
|
Back to top |
|
|
|