Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Intermittent ICMP destination host unreachable
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
Irayo
n00b
n00b


Joined: 05 Aug 2006
Posts: 16

PostPosted: Thu Feb 13, 2014 11:12 pm    Post subject: Intermittent ICMP destination host unreachable Reply with quote

Hi,

I'm using Gentoo on a router/gateway box. I have latest stable gentoo-sources and pretty standard network settings for a router (DHCP on the WAN-side IP, static LAN-side IP, forwarding enabled).

Mostly, everything works as it should. But every so often on my client machines (all of them, including various OSes: Linux Mint, Gentoo, OS X, Windows ...), I receive a batch of ICMP "destination host unreachable" responses from the router machine for all types/ports/destinations of traffic. Each batch of ICMP errors seems to correspond to the preceding 5-10 seconds of packets that my client machines have attempted to send.

This happens at seemingly at random intervals ranging from 30 seconds apart to 1500 seconds apart and it has been very difficult to diagnose so far because of this.

Every time I try to search for this problem I receive reports of people having "destination unreachable" errors 100% of the time, but I've yet to find anyone who has this intermittently-occurring disconnect. It's almost as if my routing table is resetting and repopulating itself every so often, and "forgets" how to route while it does so -- but of course I can't catch anything happening because it happens intermittently, and there are no log messages in system logs or dmesg.

Right now I'm testing whether the router machine itself ever gets "destination unreachable" errors from itself/upstream, or if it is only reporting this errors to clients (this might tell me whether it's a routing issue, hardware issue, or upstream issue depending on the results). But so far, I have not encountered this issue on the router machine.

Does anyone have an idea what might be causing this issue?
Back to top
View user's profile Send private message
smerf
l33t
l33t


Joined: 06 Nov 2004
Posts: 778
Location: Polska

PostPosted: Fri Feb 14, 2014 4:55 pm    Post subject: Reply with quote

Have you excluded hardware problem (cable, nic, link negotiation)?
_________________
Microsoft is not the answer, Microsoft is the question, the answer is no.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3500

PostPosted: Fri Feb 14, 2014 7:25 pm    Post subject: Reply with quote

when destination host is unreachable it should also print you the failing host's address. Like in: if you are 10.0.0.100 and router is 10.0.0.1 and you ping google, failure on 10.0.0.100 means you lost connection to router, failure on 10.0.0.1 means router has no path avialible, and something further away means someone in your ISP's facility tripped over a wire. Better use this way rather than pinging separately on all devices to determine where the problem exists.

I've encountered such an issue when I assigned 2 nic interfaces IPs from the very same subnet pool. So it sometimes attempted to send data over wrong connection. Yeah, i know, stupit mistake.
Back to top
View user's profile Send private message
Irayo
n00b
n00b


Joined: 05 Aug 2006
Posts: 16

PostPosted: Fri Feb 14, 2014 7:30 pm    Post subject: Reply with quote

smerf wrote:
Have you excluded hardware problem (cable, nic, link negotiation)?


Not yet. I don't think there are any "hard" disconnects as those would be printed to the kernel log ("link down", "link is not ready"), and the devices are all properly negotiated at 1000mbps full-duplex, but there could still be issues with the NIC/cables. I've been waiting for a good opportunity to shut down. I don't have a replacement NIC to test, but my plan is to swap the interfaces (WAN<->LAN) and use new cables. If the problem remains the same, it seems likely to be a configuration or upstream problem; if the problem changes (if I start getting LAN connection issues or something) then it's probably a NIC issue and I'll get a replacement.
Back to top
View user's profile Send private message
Irayo
n00b
n00b


Joined: 05 Aug 2006
Posts: 16

PostPosted: Fri Feb 14, 2014 7:40 pm    Post subject: Reply with quote

szatox wrote:
when destination host is unreachable it should also print you the failing host's address. Like in: if you are 10.0.0.100 and router is 10.0.0.1 and you ping google, failure on 10.0.0.100 means you lost connection to router, failure on 10.0.0.1 means router has no path avialible, and something further away means someone in your ISP's facility tripped over a wire.


Yeah. The clients see the failure as occurring at my router system (192.168.0.1), so I'm pretty sure I'm losing link/connection/route to the ISP.

From looking at network traffic dumps, it looks like around 2-3 seconds before the host-unreachables are sent, my router sends an ARP request to try to find the ISP's upstream router. This ARP request goes unanswered. Either this is a symptom (meaning I've already lost connection to the ISP and that is why the ARP is sent and why I get no response) or it is actually the problem (some misconfigured system doesn't respond to my ARP request as it should so I lose ability to route packets).

After all the destination-host-unreachable messages are sent, another ARP request is attempted and receives a valid response.

As a test, I've inserted a static ARP table entry to see if the problem goes away. If I haven't seen any problems in a few hours, I'll assume that did the trick and try to figure out why.
Back to top
View user's profile Send private message
smerf
l33t
l33t


Joined: 06 Nov 2004
Posts: 778
Location: Polska

PostPosted: Fri Feb 14, 2014 8:00 pm    Post subject: Reply with quote

Irayo wrote:
I don't think there are any "hard" disconnects as those would be printed to the kernel log ("link down", "link is not ready"), and the devices are all properly negotiated at 1000mbps full-duplex, but there could still be issues with the NIC/cables.

Re-negotiation of link speed does not make link to go down - even faulty cable does not always mean hard diconnects.
I had once situation where faulty NIC caused 1000/100 negotiation/degradation cycle every few minutes and I have
detected this by monitoring continuously the interface speed with ethtool. It is uncommon but theoretically possible.
_________________
Microsoft is not the answer, Microsoft is the question, the answer is no.
Back to top
View user's profile Send private message
Irayo
n00b
n00b


Joined: 05 Aug 2006
Posts: 16

PostPosted: Fri Feb 28, 2014 2:48 am    Post subject: Reply with quote

So my test of inserting a static ARP table entry for the upstream ISP router seems to have done the trick. At least until yesterday, when I rebooted... then I started having issues again until I put the static ARP entry back this evening. Now the problem has gone away again. So that's almost definitely the issue.

Any ideas why the upstream router is (sometimes) not responding to my ARP queries? There isn't any packet loss as far as I can tell...

Is there away to increase ARP request frequency or retries or something? I'm not sure what would help in this situation.
Back to top
View user's profile Send private message
smerf
l33t
l33t


Joined: 06 Nov 2004
Posts: 778
Location: Polska

PostPosted: Fri Feb 28, 2014 7:16 am    Post subject: Reply with quote

Are you directly connected to this router or there are some other devices (like switch) in between?
If so, then the problem might be on this hypothetical switch. Maybe it is not passing correctly Layer 2 broadcasts?
_________________
Microsoft is not the answer, Microsoft is the question, the answer is no.
Back to top
View user's profile Send private message
smerf
l33t
l33t


Joined: 06 Nov 2004
Posts: 778
Location: Polska

PostPosted: Fri Feb 28, 2014 9:05 am    Post subject: Reply with quote

Maybe you are experiencing some form of this issue?
_________________
Microsoft is not the answer, Microsoft is the question, the answer is no.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum