Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
mysterious network performance problems
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Mon Nov 27, 2006 8:25 pm    Post subject: mysterious network performance problems Reply with quote

We've got a server with multiple NICs, one of which is attached to a private network. Between our machine and the private network hub is a switch that we own, but our provider administers. One of our mission-critical applications communicates with this private network; delays on the order of milliseconds are very bad.

Recently, we've been seeing performance degradation on the order of 20ms. We rebooted the machine, and the situation improved. After some time (no more than a week), the performance problems returned. Again, a reboot fixed (or at least appeared to) fix the problem.

Unfortunately, none of us have the kind of network expertise required to troubleshoot a problem like this. I was hoping there are some network gurus on this forum that might be able to offer some suggestions, or some hints on where to start looking or even what questions we should be asking our network providers.

Any feedback is appreciated!
Thank you!
Back to top
View user's profile Send private message
erik258
Advocate
Advocate


Joined: 12 Apr 2005
Posts: 2650
Location: Twin Cities, Minnesota, USA

PostPosted: Wed Nov 29, 2006 3:23 am    Post subject: Reply with quote

have you considered trying to find similar problems online concerning your exact variety of network card?

have you considered switching in another network card?

have you considered writing a cron job to simply unload the module for the card and reload it every 5 days or so? a workaround, but possibly a very effective one. of course, for it to work you'd need support for that card modularized, and all similar cards (same chip) in the system would go down when the problem nic went down.
_________________
Configuring a Firewall? Try my iptables configuration
LinuxCommando.com is my blog for linux-related scraps and tidbits. Stop by for a visit!
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Wed Nov 29, 2006 2:43 pm    Post subject: Reply with quote

erik258 wrote:
have you considered trying to find similar problems online concerning your exact variety of network card?

No, but that's a good idea. This machine has an Intel GB NIC card (e1000), FWIW.

On the other hand, this machine (and our others) all have the same NICs (and drivers), and inter-machine communication does not suffer these problems.

erik258 wrote:
have you considered switching in another network card?

Not yet, but if things don't improve, we might go that route.

erik258 wrote:
have you considered writing a cron job to simply unload the module for the card and reload it every 5 days or so? a workaround, but possibly a very effective one. of course, for it to work you'd need support for that card modularized, and all similar cards (same chip) in the system would go down when the problem nic went down.

Another good idea.

Although, since my original post, rebooting maybe isn't as helpful as was once believed. We're starting to think that the problem lies outside of our domain (in which case there's not a whole lot we can do).

Anyway, thanks for the ideas and suggestions!
Matt
Back to top
View user's profile Send private message
erik258
Advocate
Advocate


Joined: 12 Apr 2005
Posts: 2650
Location: Twin Cities, Minnesota, USA

PostPosted: Wed Nov 29, 2006 2:46 pm    Post subject: Reply with quote

maybe it isn't your card. that's a pretty reputiable name in network cards :)
_________________
Configuring a Firewall? Try my iptables configuration
LinuxCommando.com is my blog for linux-related scraps and tidbits. Stop by for a visit!
Back to top
View user's profile Send private message
think4urs11
Bodhisattva
Bodhisattva


Joined: 25 Jun 2003
Posts: 6659
Location: above the cloud

PostPosted: Wed Nov 29, 2006 7:41 pm    Post subject: Reply with quote

what about link speed and duplex settings for both the NIC and the switchport?
Check wether or not both 'think' the settings are in sync with each other or not.
If autonegotiation is used the problem might be easily fixed by setting the values on both sides to fixed values.
_________________
Nothing is secure / Security is always a trade-off with usability / Do not assume anything / Trust no-one, nothing / Paranoia is your friend / Think for yourself
Back to top
View user's profile Send private message
gerdesj
l33t
l33t


Joined: 29 Sep 2005
Posts: 622
Location: Yeovil, Somerset, UK

PostPosted: Wed Nov 29, 2006 9:26 pm    Post subject: Reply with quote

Think4UrS11 wrote:
what about link speed and duplex settings for both the NIC and the switchport?
Check wether or not both 'think' the settings are in sync with each other or not.
If autonegotiation is used the problem might be easily fixed by setting the values on both sides to fixed values.


Fair point. Use ethtool to find out what is going on. If you see half duplex anywhere then almost certainly you have an autoneg vs hard strapping problem.

Example output:

rum ~ # ethtool eth0
Settings for eth0:
Supported ports: [ MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: No
Speed: 100Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
Current message level: 0x000000ff (255)
Link detected: yes
===========================================

Here you can see I am using autoneg (even though it isn't advertised) - "Auto-negotiation: on" and I am running at 100/Full which is correct. if you see auto neg is on at your end but the link is 100Mb/s with Duplex: Half then you have real problems. In that case you must either set both ends to 100 full or both ends to autoneg.

#ethtool -s eth0 speed 100 duplex full autoneg off
... will force 100 full (don't forget to turn off autoneg - the third setting above)

If ethtool doesn't work with your card, try mii-tool and in the last resort the output of:

#dmesg | grep eth

#modinfo e1000
...will give you some info on module load parameters to e1000 that you can use to hard strap speed etc if you can't use ethtool.

If ethtool is needed, stick the line in /etc/conf.d/local.start and it will be run on boot up. You could probably put it in a post script section in /etc/conf.d/net as well.

Cheers
Jon
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Wed Nov 29, 2006 10:38 pm    Post subject: Reply with quote

Think4UrS11 and gerdesj, thank you for your suggestions.

In fact, we did discover that our switch was hard coded to 100Mbs, full-duplex (i.e. auto negotiation disabled). However, our e1000 network card was set to auto negotiate.

I found out that, by default, when this specific situation occurs, that the e1000 driver will default to 100Mbps, half duplex. We discovered and remedied this problem a while ago, however.

Now we're afraid that someone upstream (i.e. out of our control) has made the same oversight. Unfortunately, I don't know how to test if that is the case or not.

Thanks again!
Back to top
View user's profile Send private message
gerdesj
l33t
l33t


Joined: 29 Sep 2005
Posts: 622
Location: Yeovil, Somerset, UK

PostPosted: Thu Nov 30, 2006 10:37 am    Post subject: Reply with quote

>>In fact, we did discover that our switch was hard coded to 100Mbs, full-duplex (i.e. auto negotiation disabled). However, our e1000 network card was set to auto negotiate.

Its a classic!

>>I found out that, by default, when this specific situation occurs, that the e1000 driver will default to 100Mbps, half duplex. We discovered and remedied this problem a while ago, however.

>>Now we're afraid that someone upstream (i.e. out of our control) has made the same oversight. Unfortunately, I don't know how to test if that is the case or not.

No, its not a card fault but the standard: In the event of autoneg failing then an interface will be able to work out the correct line speed but will not get the duplex and will default to half. So:

Switch forced at 100 full and a NIC left at autoneg. The NIC will get the 100 correct but default to half duplex.

As a result it is easy to determine when a mismatch occurs. Look for a NIC which is set to autoneg and is running at 100 half with dreadfull performance - its that simple and use ethtool to diagnose.

The simple rule is this: BOTH ENDS MUST MATCH - EITHER FORCED SPEED AND DUPLEX or AUTONEGOTIATION

The default of half duplex might seem silly but it is a hang over from the days of 10/half being the common set up. Back then the default of half in the event of autoneg failing was sensible. Nowadays no one runs 100/half (that I know of) but the standard has stuck. With gigabit there is no such thing as forcing the card - they all run at 1000/full but here you are plugged into a 100 bit port so it uses the standard for 10/100/half/full/autoneg.

When running at the wrong duplex, you should get quite a few errors, so why not use Cacti/MRTG or whatever to monitor the cards' error counts and notify you or really simple write a script run from cron that parses the output of ethtool and mails/pages you in the event of 100 half.

Cheers
Jon
Back to top
View user's profile Send private message
number_nine
Tux's lil' helper
Tux's lil' helper


Joined: 05 May 2005
Posts: 136

PostPosted: Thu Nov 30, 2006 5:32 pm    Post subject: Reply with quote

gerdesj wrote:
No, its not a card fault but the standard: In the event of autoneg failing then an interface will be able to work out the correct line speed but will not get the duplex and will default to half.


Understood. But what if someone else is unaware of this (as we once were)? Our situation looks like this:
Code:
server host -------- their switch -------- our switch -------- our machine

In other words, we're part of someone's private network (no Internet involved). What if their switch is fixed at 100/Full, but server host is set to autonegotiate? Then server host would default to 100/Half. That's extremely bad for us.

Basically, what we're effectively seeing is consistent 20ms delays in communication between us and them. I'm not even sure if that symptom would be consistent with the problem we've been discussing. :?

Anyway... thanks again!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum