View previous topic :: View next topic |
Author |
Message |
Cygon Tux's lil' helper
Joined: 05 Feb 2006 Posts: 115 Location: Germany
|
Posted: Wed Mar 07, 2012 1:15 pm Post subject: eth0 'link down' every 2 - 48 hours |
|
|
Some weeks ago, my home server's network (which had run fine for >2 years) started to have outages. I don't know how I could track down this issue, maybe someone more knowledgeable than me could give me some advice?
Here's what I observed when the outages occur:- My server becomes completely unreachable (I can neither ping my server's IP from the outside nor can I ping another LAN IP from the server).
- The LEDs on the network adapter and switch still show a connection
- Shutting down eth0 results in the switch no longer showing a connection. Upon restarting, the connection LED is green again, but eth0 remains bricked.
- After rebooting, everything works fine again
- I have another network adapter in this server. Suspecting a hardware issue, I flipped the adapters via udev rules, but the outages still occurred.
- It might be that higher bandwidth usage increases the likelihood of an outage
- The issue seems to resolve itself after a few hours
This morning, I had these lines in my syslog: Code: | Mar 7 07:31:23 tiamat kernel: r8169 0000:03:00.0: eth0: link down
Mar 7 07:31:26 tiamat kernel: r8169 0000:03:00.0: eth0: link up
Mar 7 09:28:07 tiamat kernel: r8169 0000:03:00.0: eth0: link down
Mar 7 09:28:10 tiamat kernel: r8169 0000:03:00.0: eth0: link up |
I don't see them for the other 2 outages I had since, so I'm not sure if it's related. I can't find anything else in the logs. Since using a different network adapter didn't have any effect, I now believe this is a software issue.
I'd be grateful for any help in finding out what's going on! |
|
Back to top |
|
|
gentoo_ram Guru
Joined: 25 Oct 2007 Posts: 513 Location: San Diego, California USA
|
Posted: Wed Mar 07, 2012 9:45 pm Post subject: |
|
|
I was having weird problems with my ethernet link going up and down to my cable modem. Tried all kinds of stuff, nothing worked... until I swapped the ethernet cable. Problem solved. |
|
Back to top |
|
|
chiefbag Guru
Joined: 01 Oct 2010 Posts: 542 Location: The Kingdom
|
|
Back to top |
|
|
Evileye l33t
Joined: 06 Aug 2003 Posts: 782 Location: Toronto
|
Posted: Thu Mar 08, 2012 8:30 pm Post subject: |
|
|
I'm using Realtek chipsets on my server and am having a similar problem. I have had this server running for almost a year but only ran into this problem over the last little while. I reboot the server and everything works again. I checked my logs and found the following, same as what you are seeing...
Code: | Mar 8 13:55:36 penguin kernel: r8169 0000:03:00.0: eth1: link down
Mar 8 13:55:37 penguin kernel: r8169 0000:01:00.0: eth0: link up |
I'll try different network cards and see if that makes a difference. |
|
Back to top |
|
|
Cygon Tux's lil' helper
Joined: 05 Feb 2006 Posts: 115 Location: Germany
|
Posted: Wed Mar 14, 2012 11:52 pm Post subject: |
|
|
Thanks for the tips. After my last post, it worked straight for almost 72 hours, so I held back on any changes in to make sure I'm not jumping to conclusions. Today, I had 4 outages in the last 6 hours again, so here goes:
- The second outage was a kernel panic. I changed the cables while I checked it, but no joy.
- During the third outage I was connected via SSH and noticed that responses got slower and slower (pings lost or >3 seconds, the screen from 'top' was sent in two packets, I had to look at half a console window for several seconds until the other half got through
- Before rebooting my home server, I tried rebooting my switch. No change.
- The fourth time it happened, I took down eth0 and eth1 (eth1 has nothing connected to it). Unlike before, after bringing eth0 back up, pings got through again!
Here's ifconfig after eth0 recovered: Code: | eth0 Link encap:Ethernet HWaddr 00:1e:2a:d2:89:5e
inet addr:192.168.124.1 Bcast:192.168.124.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:294470 errors:19 dropped:1360 overruns:0 frame:85
TX packets:433366 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:124079469 (118.3 MiB) TX bytes:453476682 (432.4 MiB)
Interrupt:19 Base address:0xac00 |
I'm still no smarter than before, but that I've had two kernel panics on a server that ran rock solid for 2 years makes me think that maybe, just maybe this could be a weird hardware issue after all. Maybe a bad capacitor about to give in and causing random issues or so. |
|
Back to top |
|
|
Cygon Tux's lil' helper
Joined: 05 Feb 2006 Posts: 115 Location: Germany
|
Posted: Sat Mar 17, 2012 3:26 pm Post subject: |
|
|
I had another 2 kernel panics yesterday. Either the built-in r8169 driver is seriously messed (doubtful) or my server is experiencing a hardware failure.
Just ordered a new mainboard, CPU and RAM. I'll report back whether this fixes the issue. Otherwise I've got no idea what I'll do. |
|
Back to top |
|
|
Evileye l33t
Joined: 06 Aug 2003 Posts: 782 Location: Toronto
|
Posted: Sat Mar 31, 2012 4:14 am Post subject: |
|
|
I tried 2 new network cards (Intel Pro 1000 GT) and didn't make a difference, I still had the internet go down.
Did the new CPU/Mobo/RAM fix the problem for you? |
|
Back to top |
|
|
Cygon Tux's lil' helper
Joined: 05 Feb 2006 Posts: 115 Location: Germany
|
Posted: Thu Apr 19, 2012 9:24 am Post subject: |
|
|
I replaced my server's mainboard, CPU, RAM last week, leaving everything else the same. The server is now rock solid again.
So it was a hardware issue after all.
There were no visibly blasted caps on the old board, leaving me a bit in the dark as to what might have failed. My PSU was also fine (I checked it just in case it might have gone below the minimum voltage on any line -- had that experience with another PC ages ago). |
|
Back to top |
|
|
Cygon Tux's lil' helper
Joined: 05 Feb 2006 Posts: 115 Location: Germany
|
Posted: Sat May 05, 2012 11:10 pm Post subject: |
|
|
Looks like the story isn't over yet.
A few days ago the same issues began happening again. Absolutely nothing in the logs, but every few days networking just stops working with no packets going in or out of eth0 on my server. This really got me confused since I had replaced my server hardware, cabling and even bought a different network adapter for my workstation.
Completely out of ideas, I unplugged my entire home server including switch, router and modem to let it cool down a bit (since the last time it was off for a few hours was during the hardware replacement). Yep, pretty desperate, but I was running out of ideas and my only hope was to find some kind of system in this madness.
Upon powering up again, I observed my switch displaying a connection on port 2, then 3, then 4, then only 3, then jumped between 2 and 4 a bit, then 2, 3 and 4 at the same time, then only 3 again... that didn't seem normal. Especially when it kept going on and on with no sign of settling down. Two of those three devices are my Squeezebox and another switch, clearly they shouldn't come up and loose connection again all the time. Nothing like that was happening before I powered down the switch, so this was the first time that switch was brought to my attention.
So now I took that switch out of the look and I'm using my router's built-in switch. I don't know whether the issues will return, but removing that switch already had a very positive effect on another issue I was suffering from: before, there were lots of transmission errors visible in ifconfig and my workstation could upload files at no more than 150-200 KiB/s. Now I easily get 25 MiB/s upload and zero errors.
I now believe it's likely that the switch was the culprit all the time (and the kernel panics being a separate, unrelated hardware issue). All I can do now is hope that this is the end of it. Unless my hopes are crushed once again, I'll keep things as they are right now and I'll try to remember to revisit this thread again in a few weeks, to report that everything is finally alright |
|
Back to top |
|
|
Evileye l33t
Joined: 06 Aug 2003 Posts: 782 Location: Toronto
|
Posted: Mon May 07, 2012 10:21 pm Post subject: |
|
|
I switched to using rp-pppoe to connect to the internet instead of ppp entries in /etc/conf.d/net and I have been up for 15 days without any problems and my box hasn't disconnected from the internet in all that time. |
|
Back to top |
|
|
|