Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[Solved] NFS server goes down like clockwork, every 15 mins
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
TJNII
l33t
l33t


Joined: 09 Nov 2003
Posts: 648
Location: for(;;);

PostPosted: Thu Aug 23, 2007 4:49 am    Post subject: [Solved] NFS server goes down like clockwork, every 15 mins Reply with quote

Tonight I made the mistake of trying to upgrade my system. Now NFS (1.0.12) doesn't work right. The server will run for 15 minutes (exactly. I timed it.) and then go belly up. Mounts on client machines returr the "Stale NFS file handle" error. Restarting the NFS server corrects this, with no action needed on the client.

I haven't changed anything from my old version (1.0.4, I think. Stable version from early this year), other than update nfs-utils. I've tried no_subtree_check in my exports file and -o rsize=8192,wsize=8192,tcp,hard,intr on the clients. No_subtree_check made it last a little longer, client options didn't have any effect (other than making the client apps wig out less).

I don't see anything usefull in the logs, though I'm not sure exactly which logfile NFS dumps to. /usr/sbin/rpc.nfsd doesn't have a verbose option.

Help?


Last edited by TJNII on Fri Aug 24, 2007 8:11 pm; edited 1 time in total
Back to top
View user's profile Send private message
turtles
Veteran
Veteran


Joined: 31 Dec 2004
Posts: 1697

PostPosted: Thu Aug 23, 2007 6:14 pm    Post subject: Reply with quote

I am no expert on this but have you recompiled your kernel? IF yes > Can you revert you your old kernel version?
EDIT: Post
Code:
grep 'NFS' /usr/src/linux/.config

EDIT #2
Like I said I dont use NFS but I get
Code:
dep -l nfs-utils
sed: read error on /packages: Is a directory
net-fs/nfs-utils-1.0.12:
    !nonfsv4?       >=dev-libs/libevent-1.0b dev-libs/libevent-1.3a
    !nonfsv4?       >=net-libs/libnfsidmap-0.16 net-libs/libnfsidmap-0.17
                    >=net-nds/portmap-5b-r6  net-nds/portmap-5b-r9
    tcpd?           sys-apps/tcp-wrappers    sys-apps/tcp-wrappers-7.6-r8


May try a
Code:
emerge -1ua dev-libs/libevent net-libs/libnfsidmap net-nds/portmap sys-apps/tcp-wrappers    sys-apps/tcp-wrappers
Back to top
View user's profile Send private message
embobo
Guru
Guru


Joined: 19 May 2003
Posts: 311

PostPosted: Thu Aug 23, 2007 6:32 pm    Post subject: Reply with quote

I've had similar problems. I traced the problem to kernel 2.6.22. Until today, I've had to use 2.6.21. Now I'm trying 2.6.22-gentoo-r4.

From the client try

Code:

rpcinfo -t <server> nlockmgr
rpcinfo -u <server> nlockmgr


In my case it wouldn't respond.

Edit: Just found out 2.6.22-gentoo-r4 has the same problem. Back to 2.6.21 for me.
Back to top
View user's profile Send private message
turtles
Veteran
Veteran


Joined: 31 Dec 2004
Posts: 1697

PostPosted: Thu Aug 23, 2007 6:38 pm    Post subject: Reply with quote

Here is a similar problem with a work around
Back to top
View user's profile Send private message
embobo
Guru
Guru


Joined: 19 May 2003
Posts: 311

PostPosted: Thu Aug 23, 2007 6:47 pm    Post subject: Reply with quote

turtles wrote:
Here is a similar problem with a work around


You call that a workaround? Lol!
Back to top
View user's profile Send private message
TJNII
l33t
l33t


Joined: 09 Nov 2003
Posts: 648
Location: for(;;);

PostPosted: Thu Aug 23, 2007 11:44 pm    Post subject: Reply with quote

I haven't touched the kernel in months. Unless there is some quirk in 1.0.12 and the 2.6.16 kernels I doubt that's the problem.

Everything else works OK on the system, and I can get the fault time to vary with different options, so I don't think its failing hardware. (Though I've been wrong before.) Only nfs seems to have a problem, unlike that other linked thread. If it is some wierd combination issue it is going to be stupid hard to find.

I've turned off all the use flags and I have conservative compiler flags. I don't know what else to try.

Code:
[ebuild   R   ] net-fs/nfs-utils-1.0.12  USE="-kerberos -nonfsv4 -tcpd" 0 kB
Back to top
View user's profile Send private message
TJNII
l33t
l33t


Joined: 09 Nov 2003
Posts: 648
Location: for(;;);

PostPosted: Fri Aug 24, 2007 7:51 pm    Post subject: Reply with quote

Last night I did the 15 minute restart and the client didn't start working again. It had done this before, but infrequently. So instead of /etc/init.d/nfs restart, which I had been doing, I did /etc/init.d/nfs stop. While the restarts had been reporting success, this reported fail. a ps -ec revealed a couple rpc processes that wouldn't stop, including rpc.gssd. Thus, rpc.gssd had been running since I compiled out Kerberos a few days and many nfs restarts before. I killed all the rpc processes still running, restarted portmap for good measure, and started nfs again. Now it seems to be working.


I have a feeling, though I can't prove it now, that the root cause was Kerberos. I had a Kerberos use flag set from when I fiddled with it a long time ago, but I never set up Kerberos properly. Until this update I don't think nfs tried to use Kerberos, so I never had a problem.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum