Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
NFS Help: "server not responding" but connectivity is good
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Sat Sep 13, 2008 3:43 am    Post subject: NFS Help: "server not responding" but connectivit Reply with quote

I'm having problems with some NFS mounts. I have one server with several exports and multiple clients mounting the same exports. Only one client has problems. Inside a few hours of booting up, I'll find the following message at the end of dmesg output:
Code:
nfs: server 192.168.1.1 not responding, still trying
and it never recovers. I expect to see:
Code:
nfs: server 192.168.1.1 OK
but never do. While this is happening, I can SSH in and scp files back and forth with good throughput. Other clients have no problems with the same mount. So even if there is a connectivity issue why isn't that client automatically restoring the connection? How can I go about debugging?

I'm using NFS v3 server and client all around.
Back to top
View user's profile Send private message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Sun Sep 14, 2008 9:51 pm    Post subject: Reply with quote

Do you have a Marvell network card?
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Sun Sep 14, 2008 11:23 pm    Post subject: Reply with quote

Nope, unless one of the two below use Marvell stuff
Code:
$ lspci | grep -i net
00:05.0 Bridge: nVidia Corporation CK8S Ethernet Controller (rev a2)
02:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
The active interface is using the r8169 kernel driver.
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Mon Sep 15, 2008 12:28 am    Post subject: Reply with quote

I've found the first post that sounds reasonably similar to mine. On my client that's having trouble netstat shows a connection to port 2049 on the NFS server "stuck" in FIN_WAIT2 state.

http://www.ussg.iu.edu/hypermail/linux/kernel/0808.3/0123.html
Back to top
View user's profile Send private message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Mon Sep 15, 2008 7:47 am    Post subject: Reply with quote

I once had troubles like this with a Marvell adapter. I never succeeded in fixing them so I plugged in a Realtek 8139. I know nothing about RTL-8169 unfortunately. Have you tried with a good old, well known network adapter like NE2000 or RTL-8139? I know for sure these work perfectly.
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Mon Sep 15, 2008 10:22 am    Post subject: Reply with quote

I don't believe it's a network adapter problem. I have no problem with SSH or SCP during the period NFS reports troubles. This implies to me it's a problem with NFS software.
Back to top
View user's profile Send private message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Mon Sep 15, 2008 11:30 am    Post subject: Reply with quote

RosenSama wrote:
I don't believe it's a network adapter problem. I have no problem with SSH or SCP during the period NFS reports troubles. This implies to me it's a problem with NFS software.

Not necessarily. I did experience exactly the same symptoms as yours. Neither SSH nor SCP failed. Only NFS caused major troubles with the network until I realized this wasn't NFS but my network adapter which caused so much troubles. You're free to believe me or not ;-) .
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!
Back to top
View user's profile Send private message
jamapii
l33t
l33t


Joined: 16 Sep 2004
Posts: 637

PostPosted: Mon Sep 15, 2008 11:40 pm    Post subject: Reply with quote

some ideas you can try:

make sure portmap is running on both machines (ps ax|grep portmap)

try mounting with the options "rsize=1024,wsize=1024"
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Tue Sep 16, 2008 1:10 am    Post subject: Reply with quote

VinzC wrote:
RosenSama wrote:
I don't believe it's a network adapter problem. I have no problem with SSH or SCP during the period NFS reports troubles. This implies to me it's a problem with NFS software.

Not necessarily. I did experience exactly the same symptoms as yours. Neither SSH nor SCP failed. Only NFS caused major troubles with the network until I realized this wasn't NFS but my network adapter which caused so much troubles. You're free to believe me or not ;-) .
I believe you, I just don't understand the mechanism by which hardware / driver can have an adverse affect on one specific high level protocol like NFS and not others.
Back to top
View user's profile Send private message
jamapii
l33t
l33t


Joined: 16 Sep 2004
Posts: 637

PostPosted: Tue Sep 16, 2008 8:52 pm    Post subject: Reply with quote

RosenSama wrote:
I believe you, I just don't understand the mechanism by which hardware / driver can have an adverse affect on one specific high level protocol like NFS and not others.


It's the difference between TCP and UDP, TCP compensates for dropped packages, with UDP the application is supposed to. NFS uses UDP usually. Try Voip, the bad network adapter should be audible ;)
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Wed Sep 17, 2008 2:35 am    Post subject: Reply with quote

So could I potentially test this by forcing NFS to use TCP somehow?
Back to top
View user's profile Send private message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Wed Sep 17, 2008 7:55 am    Post subject: Reply with quote

Isn't it simpler to just try with another (well-known) network adapter?
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!
Back to top
View user's profile Send private message
Janne Pikkarainen
Veteran
Veteran


Joined: 29 Jul 2003
Posts: 1143
Location: Helsinki, Finland

PostPosted: Thu Sep 18, 2008 6:41 pm    Post subject: Reply with quote

RosenSama wrote:
So could I potentially test this by forcing NFS to use TCP somehow?


Put tcp to your /etc/fstab to the options part of your nfs mount line.

If your NIC is connected to 100 Mbps network, make sure it's using full duplex mode instead of half duplex. Even if you are sure it's running in full duplex, please do double-check the result with ethtool eth0 or mii-tool eth0.
_________________
Yes, I'm the man. Now it's your turn to decide if I meant "Yes, I'm the male." or "Yes, I am the Unix Manual Page.".
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Sun Sep 21, 2008 1:36 am    Post subject: Reply with quote

Updating my NFS server kernel appears to have solved the issue. Was gentoo-sources 2.6.22, now is gentoo-sources 2.6.25. Thanks for your help.
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Sun Sep 21, 2008 4:41 pm    Post subject: Reply with quote

I spoke too quickly. It just takes much longer for the problem to show up.

I'll try NFS via TCP now.
Back to top
View user's profile Send private message
Janne Pikkarainen
Veteran
Veteran


Joined: 29 Jul 2003
Posts: 1143
Location: Helsinki, Finland

PostPosted: Mon Sep 22, 2008 6:36 am    Post subject: Reply with quote

Some versions of nfs-utils have been problematic for me over time, too - try to up/downgrade that package.
_________________
Yes, I'm the man. Now it's your turn to decide if I meant "Yes, I'm the male." or "Yes, I am the Unix Manual Page.".
Back to top
View user's profile Send private message
Anarcho
Advocate
Advocate


Joined: 06 Jun 2004
Posts: 2970
Location: Germany

PostPosted: Tue Sep 23, 2008 7:38 am    Post subject: Reply with quote

I had really the same issues. NFS drops but SSH and other things are working fine and other PCs don't loose the NFS mount.

I have a nVidia NIC in it, too, which I blame for these issues. I switched to an Intel Pro 1000 PCI-E Card a few weeks ago and had no problems since then. So, for me, the fix is get rid of this nVidia Network card!
_________________
...it's only Rock'n'Roll, but I like it!
Back to top
View user's profile Send private message
RosenSama
Tux's lil' helper
Tux's lil' helper


Joined: 28 Apr 2003
Posts: 99

PostPosted: Tue Sep 30, 2008 12:57 am    Post subject: Reply with quote

For me it was happening with both forcedeth and r8169 drivers on two different clients to the same server. It was just as defined in the mailing list post above. `netstat -natp` shows a "hung" connection from a client port < 1024 to server port 2049. On the client it's FIN_WAIT2 and on the server it's CLOSE_WAIT.

Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.

Thanks for all the help.
Back to top
View user's profile Send private message
VinzC
Watchman
Watchman


Joined: 17 Apr 2004
Posts: 5098
Location: Dark side of the mood

PostPosted: Tue Sep 30, 2008 8:06 am    Post subject: Reply with quote

RosenSama wrote:
For me it was happening with both forcedeth and r8169 drivers on two different clients to the same server. It was just as defined in the mailing list post above. `netstat -natp` shows a "hung" connection from a client port < 1024 to server port 2049. On the client it's FIN_WAIT2 and on the server it's CLOSE_WAIT.

Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.

Thanks for all the help.

If both clients (those that fail with NFS) have the same problematic Ethernet hardware and brand, I expect the same troubles both sides. This is just a guess though.
_________________
Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739!
Back to top
View user's profile Send private message
depontius
Advocate
Advocate


Joined: 05 May 2004
Posts: 3526

PostPosted: Tue Sep 30, 2008 11:55 am    Post subject: Reply with quote

RosenSama wrote:
Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.


There has been a thread on LKML in the past month about nfs hangs beginning with 2.6.25-6 or so. I'm running 2.6.25-hardened-r7 on my server and 2.6.25-gentoo-r7 on my clients. I get occasional "stalls" for maybe 5 seconds at a time, but no out-and-out hangs. There are some hints that ACLs may be involved, and I'm not sure if the critical kernel level is on the server or client. I'm living with it for now, but next time I build a kernel I'm going to try leaving out the ACLs, since I've never gotten around to using them. I'll add them back in when the stall is resolved, and then never get around to using them for several more years.
_________________
.sigs waste space and bandwidth
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum