View previous topic :: View next topic |
Author |
Message |
number_nine Tux's lil' helper

Joined: 05 May 2005 Posts: 136
|
Posted: Mon Nov 27, 2006 8:25 pm Post subject: mysterious network performance problems |
|
|
We've got a server with multiple NICs, one of which is attached to a private network. Between our machine and the private network hub is a switch that we own, but our provider administers. One of our mission-critical applications communicates with this private network; delays on the order of milliseconds are very bad.
Recently, we've been seeing performance degradation on the order of 20ms. We rebooted the machine, and the situation improved. After some time (no more than a week), the performance problems returned. Again, a reboot fixed (or at least appeared to) fix the problem.
Unfortunately, none of us have the kind of network expertise required to troubleshoot a problem like this. I was hoping there are some network gurus on this forum that might be able to offer some suggestions, or some hints on where to start looking or even what questions we should be asking our network providers.
Any feedback is appreciated!
Thank you! |
|
Back to top |
|
 |
erik258 Advocate


Joined: 12 Apr 2005 Posts: 2650 Location: Twin Cities, Minnesota, USA
|
Posted: Wed Nov 29, 2006 3:23 am Post subject: |
|
|
have you considered trying to find similar problems online concerning your exact variety of network card?
have you considered switching in another network card?
have you considered writing a cron job to simply unload the module for the card and reload it every 5 days or so? a workaround, but possibly a very effective one. of course, for it to work you'd need support for that card modularized, and all similar cards (same chip) in the system would go down when the problem nic went down. _________________ Configuring a Firewall? Try my iptables configuration
LinuxCommando.com is my blog for linux-related scraps and tidbits. Stop by for a visit! |
|
Back to top |
|
 |
number_nine Tux's lil' helper

Joined: 05 May 2005 Posts: 136
|
Posted: Wed Nov 29, 2006 2:43 pm Post subject: |
|
|
erik258 wrote: | have you considered trying to find similar problems online concerning your exact variety of network card? |
No, but that's a good idea. This machine has an Intel GB NIC card (e1000), FWIW.
On the other hand, this machine (and our others) all have the same NICs (and drivers), and inter-machine communication does not suffer these problems.
erik258 wrote: | have you considered switching in another network card? |
Not yet, but if things don't improve, we might go that route.
erik258 wrote: | have you considered writing a cron job to simply unload the module for the card and reload it every 5 days or so? a workaround, but possibly a very effective one. of course, for it to work you'd need support for that card modularized, and all similar cards (same chip) in the system would go down when the problem nic went down. |
Another good idea.
Although, since my original post, rebooting maybe isn't as helpful as was once believed. We're starting to think that the problem lies outside of our domain (in which case there's not a whole lot we can do).
Anyway, thanks for the ideas and suggestions!
Matt |
|
Back to top |
|
 |
erik258 Advocate


Joined: 12 Apr 2005 Posts: 2650 Location: Twin Cities, Minnesota, USA
|
Posted: Wed Nov 29, 2006 2:46 pm Post subject: |
|
|
maybe it isn't your card. that's a pretty reputiable name in network cards :) _________________ Configuring a Firewall? Try my iptables configuration
LinuxCommando.com is my blog for linux-related scraps and tidbits. Stop by for a visit! |
|
Back to top |
|
 |
think4urs11 Bodhisattva


Joined: 25 Jun 2003 Posts: 6659 Location: above the cloud
|
Posted: Wed Nov 29, 2006 7:41 pm Post subject: |
|
|
what about link speed and duplex settings for both the NIC and the switchport?
Check wether or not both 'think' the settings are in sync with each other or not.
If autonegotiation is used the problem might be easily fixed by setting the values on both sides to fixed values. _________________ Nothing is secure / Security is always a trade-off with usability / Do not assume anything / Trust no-one, nothing / Paranoia is your friend / Think for yourself |
|
Back to top |
|
 |
gerdesj l33t


Joined: 29 Sep 2005 Posts: 622 Location: Yeovil, Somerset, UK
|
Posted: Wed Nov 29, 2006 9:26 pm Post subject: |
|
|
Think4UrS11 wrote: | what about link speed and duplex settings for both the NIC and the switchport?
Check wether or not both 'think' the settings are in sync with each other or not.
If autonegotiation is used the problem might be easily fixed by setting the values on both sides to fixed values. |
Fair point. Use ethtool to find out what is going on. If you see half duplex anywhere then almost certainly you have an autoneg vs hard strapping problem.
Example output:
rum ~ # ethtool eth0
Settings for eth0:
Supported ports: [ MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: No
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Current message level: 0x000000ff (255)
Link detected: yes
===========================================
Here you can see I am using autoneg (even though it isn't advertised) - "Auto-negotiation: on" and I am running at 100/Full which is correct. if you see auto neg is on at your end but the link is 100Mb/s with Duplex: Half then you have real problems. In that case you must either set both ends to 100 full or both ends to autoneg.
#ethtool -s eth0 speed 100 duplex full autoneg off
... will force 100 full (don't forget to turn off autoneg - the third setting above)
If ethtool doesn't work with your card, try mii-tool and in the last resort the output of:
#dmesg | grep eth
#modinfo e1000
...will give you some info on module load parameters to e1000 that you can use to hard strap speed etc if you can't use ethtool.
If ethtool is needed, stick the line in /etc/conf.d/local.start and it will be run on boot up. You could probably put it in a post script section in /etc/conf.d/net as well.
Cheers
Jon |
|
Back to top |
|
 |
number_nine Tux's lil' helper

Joined: 05 May 2005 Posts: 136
|
Posted: Wed Nov 29, 2006 10:38 pm Post subject: |
|
|
Think4UrS11 and gerdesj, thank you for your suggestions.
In fact, we did discover that our switch was hard coded to 100Mbs, full-duplex (i.e. auto negotiation disabled). However, our e1000 network card was set to auto negotiate.
I found out that, by default, when this specific situation occurs, that the e1000 driver will default to 100Mbps, half duplex. We discovered and remedied this problem a while ago, however.
Now we're afraid that someone upstream (i.e. out of our control) has made the same oversight. Unfortunately, I don't know how to test if that is the case or not.
Thanks again! |
|
Back to top |
|
 |
gerdesj l33t


Joined: 29 Sep 2005 Posts: 622 Location: Yeovil, Somerset, UK
|
Posted: Thu Nov 30, 2006 10:37 am Post subject: |
|
|
>>In fact, we did discover that our switch was hard coded to 100Mbs, full-duplex (i.e. auto negotiation disabled). However, our e1000 network card was set to auto negotiate.
Its a classic!
>>I found out that, by default, when this specific situation occurs, that the e1000 driver will default to 100Mbps, half duplex. We discovered and remedied this problem a while ago, however.
>>Now we're afraid that someone upstream (i.e. out of our control) has made the same oversight. Unfortunately, I don't know how to test if that is the case or not.
No, its not a card fault but the standard: In the event of autoneg failing then an interface will be able to work out the correct line speed but will not get the duplex and will default to half. So:
Switch forced at 100 full and a NIC left at autoneg. The NIC will get the 100 correct but default to half duplex.
As a result it is easy to determine when a mismatch occurs. Look for a NIC which is set to autoneg and is running at 100 half with dreadfull performance - its that simple and use ethtool to diagnose.
The simple rule is this: BOTH ENDS MUST MATCH - EITHER FORCED SPEED AND DUPLEX or AUTONEGOTIATION
The default of half duplex might seem silly but it is a hang over from the days of 10/half being the common set up. Back then the default of half in the event of autoneg failing was sensible. Nowadays no one runs 100/half (that I know of) but the standard has stuck. With gigabit there is no such thing as forcing the card - they all run at 1000/full but here you are plugged into a 100 bit port so it uses the standard for 10/100/half/full/autoneg.
When running at the wrong duplex, you should get quite a few errors, so why not use Cacti/MRTG or whatever to monitor the cards' error counts and notify you or really simple write a script run from cron that parses the output of ethtool and mails/pages you in the event of 100 half.
Cheers
Jon |
|
Back to top |
|
 |
number_nine Tux's lil' helper

Joined: 05 May 2005 Posts: 136
|
Posted: Thu Nov 30, 2006 5:32 pm Post subject: |
|
|
gerdesj wrote: | No, its not a card fault but the standard: In the event of autoneg failing then an interface will be able to work out the correct line speed but will not get the duplex and will default to half. |
Understood. But what if someone else is unaware of this (as we once were)? Our situation looks like this:
Code: | server host -------- their switch -------- our switch -------- our machine |
In other words, we're part of someone's private network (no Internet involved). What if their switch is fixed at 100/Full, but server host is set to autonegotiate? Then server host would default to 100/Half. That's extremely bad for us.
Basically, what we're effectively seeing is consistent 20ms delays in communication between us and them. I'm not even sure if that symptom would be consistent with the problem we've been discussing.
Anyway... thanks again! |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|