Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
eth0 'link down' every 2 - 48 hours
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
Cygon
Tux's lil' helper
Tux's lil' helper


Joined: 05 Feb 2006
Posts: 115
Location: Germany

PostPosted: Wed Mar 07, 2012 1:15 pm    Post subject: eth0 'link down' every 2 - 48 hours Reply with quote

Some weeks ago, my home server's network (which had run fine for >2 years) started to have outages. I don't know how I could track down this issue, maybe someone more knowledgeable than me could give me some advice?

Here's what I observed when the outages occur:
  • My server becomes completely unreachable (I can neither ping my server's IP from the outside nor can I ping another LAN IP from the server).
  • The LEDs on the network adapter and switch still show a connection
  • Shutting down eth0 results in the switch no longer showing a connection. Upon restarting, the connection LED is green again, but eth0 remains bricked.
  • After rebooting, everything works fine again
  • I have another network adapter in this server. Suspecting a hardware issue, I flipped the adapters via udev rules, but the outages still occurred.
  • It might be that higher bandwidth usage increases the likelihood of an outage
  • The issue seems to resolve itself after a few hours
This morning, I had these lines in my syslog:
Code:
Mar  7 07:31:23 tiamat kernel: r8169 0000:03:00.0: eth0: link down
Mar  7 07:31:26 tiamat kernel: r8169 0000:03:00.0: eth0: link up
Mar  7 09:28:07 tiamat kernel: r8169 0000:03:00.0: eth0: link down
Mar  7 09:28:10 tiamat kernel: r8169 0000:03:00.0: eth0: link up

I don't see them for the other 2 outages I had since, so I'm not sure if it's related. I can't find anything else in the logs. Since using a different network adapter didn't have any effect, I now believe this is a software issue.

I'd be grateful for any help in finding out what's going on!
Back to top
View user's profile Send private message
gentoo_ram
Guru
Guru


Joined: 25 Oct 2007
Posts: 513
Location: San Diego, California USA

PostPosted: Wed Mar 07, 2012 9:45 pm    Post subject: Reply with quote

I was having weird problems with my ethernet link going up and down to my cable modem. Tried all kinds of stuff, nothing worked... until I swapped the ethernet cable. Problem solved.
Back to top
View user's profile Send private message
chiefbag
Guru
Guru


Joined: 01 Oct 2010
Posts: 542
Location: The Kingdom

PostPosted: Thu Mar 08, 2012 9:44 am    Post subject: Reply with quote

Failing it being a physical hardware issue like previously mentioned these Realtek chipsets are notorious.

Have you recently performed any system/kernel upgrades?

See the below thread for just one example.

https://forums.gentoo.org/viewtopic-t-908102-highlight-r8169.html
Back to top
View user's profile Send private message
Evileye
l33t
l33t


Joined: 06 Aug 2003
Posts: 782
Location: Toronto

PostPosted: Thu Mar 08, 2012 8:30 pm    Post subject: Reply with quote

I'm using Realtek chipsets on my server and am having a similar problem. I have had this server running for almost a year but only ran into this problem over the last little while. I reboot the server and everything works again. I checked my logs and found the following, same as what you are seeing...

Code:
Mar  8 13:55:36 penguin kernel: r8169 0000:03:00.0: eth1: link down
Mar  8 13:55:37 penguin kernel: r8169 0000:01:00.0: eth0: link up


I'll try different network cards and see if that makes a difference.
Back to top
View user's profile Send private message
Cygon
Tux's lil' helper
Tux's lil' helper


Joined: 05 Feb 2006
Posts: 115
Location: Germany

PostPosted: Wed Mar 14, 2012 11:52 pm    Post subject: Reply with quote

Thanks for the tips. After my last post, it worked straight for almost 72 hours, so I held back on any changes in to make sure I'm not jumping to conclusions. Today, I had 4 outages in the last 6 hours again, so here goes:
  • The second outage was a kernel panic. I changed the cables while I checked it, but no joy.
  • During the third outage I was connected via SSH and noticed that responses got slower and slower (pings lost or >3 seconds, the screen from 'top' was sent in two packets, I had to look at half a console window for several seconds until the other half got through
  • Before rebooting my home server, I tried rebooting my switch. No change.
  • The fourth time it happened, I took down eth0 and eth1 (eth1 has nothing connected to it). Unlike before, after bringing eth0 back up, pings got through again!
Here's ifconfig after eth0 recovered:
Code:
eth0      Link encap:Ethernet  HWaddr 00:1e:2a:d2:89:5e
          inet addr:192.168.124.1  Bcast:192.168.124.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:294470 errors:19 dropped:1360 overruns:0 frame:85
          TX packets:433366 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:124079469 (118.3 MiB)  TX bytes:453476682 (432.4 MiB)
          Interrupt:19 Base address:0xac00


I'm still no smarter than before, but that I've had two kernel panics on a server that ran rock solid for 2 years makes me think that maybe, just maybe this could be a weird hardware issue after all. Maybe a bad capacitor about to give in and causing random issues or so.
Back to top
View user's profile Send private message
Cygon
Tux's lil' helper
Tux's lil' helper


Joined: 05 Feb 2006
Posts: 115
Location: Germany

PostPosted: Sat Mar 17, 2012 3:26 pm    Post subject: Reply with quote

I had another 2 kernel panics yesterday. Either the built-in r8169 driver is seriously messed (doubtful) or my server is experiencing a hardware failure.

Just ordered a new mainboard, CPU and RAM. I'll report back whether this fixes the issue. Otherwise I've got no idea what I'll do.
Back to top
View user's profile Send private message
Evileye
l33t
l33t


Joined: 06 Aug 2003
Posts: 782
Location: Toronto

PostPosted: Sat Mar 31, 2012 4:14 am    Post subject: Reply with quote

I tried 2 new network cards (Intel Pro 1000 GT) and didn't make a difference, I still had the internet go down.

Did the new CPU/Mobo/RAM fix the problem for you?
Back to top
View user's profile Send private message
Cygon
Tux's lil' helper
Tux's lil' helper


Joined: 05 Feb 2006
Posts: 115
Location: Germany

PostPosted: Thu Apr 19, 2012 9:24 am    Post subject: Reply with quote

I replaced my server's mainboard, CPU, RAM last week, leaving everything else the same. The server is now rock solid again.

So it was a hardware issue after all.

There were no visibly blasted caps on the old board, leaving me a bit in the dark as to what might have failed. My PSU was also fine (I checked it just in case it might have gone below the minimum voltage on any line -- had that experience with another PC ages ago).
Back to top
View user's profile Send private message
Cygon
Tux's lil' helper
Tux's lil' helper


Joined: 05 Feb 2006
Posts: 115
Location: Germany

PostPosted: Sat May 05, 2012 11:10 pm    Post subject: Reply with quote

Looks like the story isn't over yet.

A few days ago the same issues began happening again. Absolutely nothing in the logs, but every few days networking just stops working with no packets going in or out of eth0 on my server. This really got me confused since I had replaced my server hardware, cabling and even bought a different network adapter for my workstation.

Completely out of ideas, I unplugged my entire home server including switch, router and modem to let it cool down a bit (since the last time it was off for a few hours was during the hardware replacement). Yep, pretty desperate, but I was running out of ideas and my only hope was to find some kind of system in this madness.

Upon powering up again, I observed my switch displaying a connection on port 2, then 3, then 4, then only 3, then jumped between 2 and 4 a bit, then 2, 3 and 4 at the same time, then only 3 again... that didn't seem normal. Especially when it kept going on and on with no sign of settling down. Two of those three devices are my Squeezebox and another switch, clearly they shouldn't come up and loose connection again all the time. Nothing like that was happening before I powered down the switch, so this was the first time that switch was brought to my attention.

So now I took that switch out of the look and I'm using my router's built-in switch. I don't know whether the issues will return, but removing that switch already had a very positive effect on another issue I was suffering from: before, there were lots of transmission errors visible in ifconfig and my workstation could upload files at no more than 150-200 KiB/s. Now I easily get 25 MiB/s upload and zero errors.

I now believe it's likely that the switch was the culprit all the time (and the kernel panics being a separate, unrelated hardware issue). All I can do now is hope that this is the end of it. Unless my hopes are crushed once again, I'll keep things as they are right now and I'll try to remember to revisit this thread again in a few weeks, to report that everything is finally alright :)
Back to top
View user's profile Send private message
Evileye
l33t
l33t


Joined: 06 Aug 2003
Posts: 782
Location: Toronto

PostPosted: Mon May 07, 2012 10:21 pm    Post subject: Reply with quote

I switched to using rp-pppoe to connect to the internet instead of ppp entries in /etc/conf.d/net and I have been up for 15 days without any problems and my box hasn't disconnected from the internet in all that time.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum