View previous topic :: View next topic |
Author |
Message |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Sat Sep 13, 2008 3:43 am Post subject: NFS Help: "server not responding" but connectivit |
|
|
I'm having problems with some NFS mounts. I have one server with several exports and multiple clients mounting the same exports. Only one client has problems. Inside a few hours of booting up, I'll find the following message at the end of dmesg output: Code: | nfs: server 192.168.1.1 not responding, still trying | and it never recovers. I expect to see: Code: | nfs: server 192.168.1.1 OK | but never do. While this is happening, I can SSH in and scp files back and forth with good throughput. Other clients have no problems with the same mount. So even if there is a connectivity issue why isn't that client automatically restoring the connection? How can I go about debugging?
I'm using NFS v3 server and client all around. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
VinzC Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
![](images/avatars/92679028148bc3f0ff1e99.jpg)
Joined: 17 Apr 2004 Posts: 5098 Location: Dark side of the mood
|
Posted: Sun Sep 14, 2008 9:51 pm Post subject: |
|
|
Do you have a Marvell network card? _________________ Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Sun Sep 14, 2008 11:23 pm Post subject: |
|
|
Nope, unless one of the two below use Marvell stuff Code: | $ lspci | grep -i net
00:05.0 Bridge: nVidia Corporation CK8S Ethernet Controller (rev a2)
02:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10) | The active interface is using the r8169 kernel driver. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
VinzC Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
![](images/avatars/92679028148bc3f0ff1e99.jpg)
Joined: 17 Apr 2004 Posts: 5098 Location: Dark side of the mood
|
Posted: Mon Sep 15, 2008 7:47 am Post subject: |
|
|
I once had troubles like this with a Marvell adapter. I never succeeded in fixing them so I plugged in a Realtek 8139. I know nothing about RTL-8169 unfortunately. Have you tried with a good old, well known network adapter like NE2000 or RTL-8139? I know for sure these work perfectly. _________________ Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Mon Sep 15, 2008 10:22 am Post subject: |
|
|
I don't believe it's a network adapter problem. I have no problem with SSH or SCP during the period NFS reports troubles. This implies to me it's a problem with NFS software. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
VinzC Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
![](images/avatars/92679028148bc3f0ff1e99.jpg)
Joined: 17 Apr 2004 Posts: 5098 Location: Dark side of the mood
|
Posted: Mon Sep 15, 2008 11:30 am Post subject: |
|
|
RosenSama wrote: | I don't believe it's a network adapter problem. I have no problem with SSH or SCP during the period NFS reports troubles. This implies to me it's a problem with NFS software. |
Not necessarily. I did experience exactly the same symptoms as yours. Neither SSH nor SCP failed. Only NFS caused major troubles with the network until I realized this wasn't NFS but my network adapter which caused so much troubles. You're free to believe me or not . _________________ Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jamapii l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/170100631249065103292e6.jpg)
Joined: 16 Sep 2004 Posts: 637
|
Posted: Mon Sep 15, 2008 11:40 pm Post subject: |
|
|
some ideas you can try:
make sure portmap is running on both machines (ps ax|grep portmap)
try mounting with the options "rsize=1024,wsize=1024" |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Tue Sep 16, 2008 1:10 am Post subject: |
|
|
VinzC wrote: | RosenSama wrote: | I don't believe it's a network adapter problem. I have no problem with SSH or SCP during the period NFS reports troubles. This implies to me it's a problem with NFS software. |
Not necessarily. I did experience exactly the same symptoms as yours. Neither SSH nor SCP failed. Only NFS caused major troubles with the network until I realized this wasn't NFS but my network adapter which caused so much troubles. You're free to believe me or not . | I believe you, I just don't understand the mechanism by which hardware / driver can have an adverse affect on one specific high level protocol like NFS and not others. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
jamapii l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/170100631249065103292e6.jpg)
Joined: 16 Sep 2004 Posts: 637
|
Posted: Tue Sep 16, 2008 8:52 pm Post subject: |
|
|
RosenSama wrote: | I believe you, I just don't understand the mechanism by which hardware / driver can have an adverse affect on one specific high level protocol like NFS and not others. |
It's the difference between TCP and UDP, TCP compensates for dropped packages, with UDP the application is supposed to. NFS uses UDP usually. Try Voip, the bad network adapter should be audible ![Wink ;)](images/smiles/icon_wink.gif) |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Wed Sep 17, 2008 2:35 am Post subject: |
|
|
So could I potentially test this by forcing NFS to use TCP somehow? |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
VinzC Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
![](images/avatars/92679028148bc3f0ff1e99.jpg)
Joined: 17 Apr 2004 Posts: 5098 Location: Dark side of the mood
|
Posted: Wed Sep 17, 2008 7:55 am Post subject: |
|
|
Isn't it simpler to just try with another (well-known) network adapter? _________________ Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Janne Pikkarainen Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/10433783463f526aba4144d.jpg)
Joined: 29 Jul 2003 Posts: 1143 Location: Helsinki, Finland
|
Posted: Thu Sep 18, 2008 6:41 pm Post subject: |
|
|
RosenSama wrote: | So could I potentially test this by forcing NFS to use TCP somehow? |
Put tcp to your /etc/fstab to the options part of your nfs mount line.
If your NIC is connected to 100 Mbps network, make sure it's using full duplex mode instead of half duplex. Even if you are sure it's running in full duplex, please do double-check the result with ethtool eth0 or mii-tool eth0. _________________ Yes, I'm the man. Now it's your turn to decide if I meant "Yes, I'm the male." or "Yes, I am the Unix Manual Page.". |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Sun Sep 21, 2008 1:36 am Post subject: |
|
|
Updating my NFS server kernel appears to have solved the issue. Was gentoo-sources 2.6.22, now is gentoo-sources 2.6.25. Thanks for your help. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Sun Sep 21, 2008 4:41 pm Post subject: |
|
|
I spoke too quickly. It just takes much longer for the problem to show up.
I'll try NFS via TCP now. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Janne Pikkarainen Veteran
![Veteran Veteran](/images/ranks/rank_rect_5_vet.gif)
![](images/avatars/10433783463f526aba4144d.jpg)
Joined: 29 Jul 2003 Posts: 1143 Location: Helsinki, Finland
|
Posted: Mon Sep 22, 2008 6:36 am Post subject: |
|
|
Some versions of nfs-utils have been problematic for me over time, too - try to up/downgrade that package. _________________ Yes, I'm the man. Now it's your turn to decide if I meant "Yes, I'm the male." or "Yes, I am the Unix Manual Page.". |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
Anarcho Advocate
![Advocate Advocate](/images/ranks/rank-G-1-advocate.gif)
![](images/avatars/1030393113423afb9086043.jpg)
Joined: 06 Jun 2004 Posts: 2970 Location: Germany
|
Posted: Tue Sep 23, 2008 7:38 am Post subject: |
|
|
I had really the same issues. NFS drops but SSH and other things are working fine and other PCs don't loose the NFS mount.
I have a nVidia NIC in it, too, which I blame for these issues. I switched to an Intel Pro 1000 PCI-E Card a few weeks ago and had no problems since then. So, for me, the fix is get rid of this nVidia Network card! _________________ ...it's only Rock'n'Roll, but I like it! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
RosenSama Tux's lil' helper
![Tux's lil' helper Tux's lil' helper](/images/ranks/rank_rect_1.gif)
Joined: 28 Apr 2003 Posts: 99
|
Posted: Tue Sep 30, 2008 12:57 am Post subject: |
|
|
For me it was happening with both forcedeth and r8169 drivers on two different clients to the same server. It was just as defined in the mailing list post above. `netstat -natp` shows a "hung" connection from a client port < 1024 to server port 2049. On the client it's FIN_WAIT2 and on the server it's CLOSE_WAIT.
Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.
Thanks for all the help. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
VinzC Watchman
![Watchman Watchman](/images/ranks/rank-G-2-watchman.gif)
![](images/avatars/92679028148bc3f0ff1e99.jpg)
Joined: 17 Apr 2004 Posts: 5098 Location: Dark side of the mood
|
Posted: Tue Sep 30, 2008 8:06 am Post subject: |
|
|
RosenSama wrote: | For me it was happening with both forcedeth and r8169 drivers on two different clients to the same server. It was just as defined in the mailing list post above. `netstat -natp` shows a "hung" connection from a client port < 1024 to server port 2049. On the client it's FIN_WAIT2 and on the server it's CLOSE_WAIT.
Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now.
Thanks for all the help. |
If both clients (those that fail with NFS) have the same problematic Ethernet hardware and brand, I expect the same troubles both sides. This is just a guess though. _________________ Gentoo addict: tomorrow I quit, I promise!... Just one more emerge...
1739! |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
depontius Advocate
![Advocate Advocate](/images/ranks/rank-G-1-advocate.gif)
Joined: 05 May 2004 Posts: 3526
|
Posted: Tue Sep 30, 2008 11:55 am Post subject: |
|
|
RosenSama wrote: | Anyways, as found in that thread, downgrading the kernels on the clients from 2.6.26 to 2.6.24 has cleared up the issue for several days now. |
There has been a thread on LKML in the past month about nfs hangs beginning with 2.6.25-6 or so. I'm running 2.6.25-hardened-r7 on my server and 2.6.25-gentoo-r7 on my clients. I get occasional "stalls" for maybe 5 seconds at a time, but no out-and-out hangs. There are some hints that ACLs may be involved, and I'm not sure if the critical kernel level is on the server or client. I'm living with it for now, but next time I build a kernel I'm going to try leaving out the ACLs, since I've never gotten around to using them. I'll add them back in when the stall is resolved, and then never get around to using them for several more years. _________________ .sigs waste space and bandwidth |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|