Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Xen DomU Networking Stops working under load
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Sun Aug 24, 2008 1:10 pm    Post subject: Xen DomU Networking Stops working under load Reply with quote

Hello,

I have a 2.6.21 xen DomU running pv under a 2.6.21 Dom0.

In general everything works well, no problems. However, if I load the network card, it (almost) stops sending or receiving packets.

Everything is normally pretty low access - I am running a couple of database servers and snmp server and a few other things in the domu, but the traffic frequency is extremely low - one or two users max.

If I start something like bittorrent, it will kill the networking almost immediately. I was running a Samba server on the domu for a while, but whenever I would save a file of significant size, it would kill the network.

If I xm console in, the network card appears fine, and there are a small number of bytes ticking up on transmit and receive. Nothing in messages or dmesg.

The dom0 network is fine, and I am using a single physical nic to bridge to. I have three other vms (all hvm) running on the same Xen box and they don't suffer any network issues.

I am not sure how to proceed with diagnosing this...

Cheers,

Paul
Back to top
View user's profile Send private message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Sun Aug 24, 2008 1:38 pm    Post subject: Reply with quote

The domU can ping itself incidentally, and restarting the domU interface doesn't help.

Sometimes it seems to right itself, sometimes it needs a reboot. tcpdump on the dom0 doesn't show anything.

Also, I use this domU for asterisk, and when the network is working, there are no problems with calls. So the issue seems to be more about taxing the virtual nic than packet frequency.
Back to top
View user's profile Send private message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Sun Aug 31, 2008 11:51 am    Post subject: Reply with quote

Well, ok.

I have changed the kernel in the domU from 2.6.21 to 2.6.25 - same problem. In the dom0 I have changed the network card without change.

The domU doesn't recognise anything is wrong in the messages or dmesg - it just cannot connect to anything. If I shut it down, it hangs but isn't connectible via the xm console.

If I xm destroy, it destroys. If I then try to xm create the domain again, I get
Code:
Error: Device 0 (vif) could not be connected. Hotplug scripts not working.


Nothing appears in the xen-hotplug.log.

xend.log gives:

Code:

...
[2008-08-31 19:14:11 5531] DEBUG (DevController:595) hotplugStatusCallback /local/domain/0/backend/vif/11/0/hotplug-status.
[2008-08-31 19:15:51 5531] DEBUG (XendDomainInfo:1897) XendDomainInfo.destroy: domid=11
...


So it tries for a while and then destroys the VM.

All the other VMs are working fine at this point, but if I shutdown and attempt to restart any, they will fail to get the vif too.

Adding interfaces to the bridge seems to work fine, so I guess the problem must be in the creation of the vif interface. Or at least, the vif breaks, and then xen is no longer able to create a new one.

I am not sure at what stage this takes place - prior to vif-script by the look of it...
Back to top
View user's profile Send private message
bbgermany
Veteran
Veteran


Joined: 21 Feb 2005
Posts: 1844
Location: Oranienburg/Germany

PostPosted: Mon Sep 01, 2008 2:39 pm    Post subject: Reply with quote

How do you create the xenbr interface? Im doing it this way:

/etc/conf.d/net
Code:

config_eth0=( "null" )
config_eth1=( "null" )
bridge_xenbr0="eth0 eth1"
config_xenbr0=( "192.168.23.252 netmask 255.255.255.0" )
routes_xenbr0=( "default via 192.168.23.1" )
dns_servers=( "192.168.23.20" )
dns_domain="xxx.xxx"
dns_search="xxx.xxx xxx.yyy"

brctl_xenbr0=(
        "setfd 0"
        "sethello 0"
        "stp off"
)


/etc/xen/xend-config.sxp
Code:

(network-script network-dummy)


This solved the script issues while creating the bridge.

bb
_________________
Desktop: Ryzen 5 5600G, 32GB, 2TB, RX7600
Notebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Back to top
View user's profile Send private message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Mon Sep 01, 2008 11:48 pm    Post subject: Reply with quote

I am running the network-bridge script for the bridge create - with a slight mod as I have multiple addresses on my physical nic, and these were not getting set up correctly on the bridge.

Your script sets up the bridge normally, but then also does this:

Code:

        "setfd 0"
        "sethello 0"
        "stp off"


My bridge has forward delay set to zero, and stp off. So the only difference is that you have the hello time set to zero, whereas mine is 2 seconds.

With stp off, I am not sure that hello time does anything - but googling it I have found a number of occasions where setting hello time to something other than zero fixed some Xen networking issues (high numbers of interrupts).

Can you remember why you have it set it to zero?
Back to top
View user's profile Send private message
bbgermany
Veteran
Veteran


Joined: 21 Feb 2005
Posts: 1844
Location: Oranienburg/Germany

PostPosted: Tue Sep 02, 2008 6:17 am    Post subject: Reply with quote

iirc, i used the gentoo wiki entry to configure my xen. there was this. i use multiple addresses on the bridge as well. iproute2 did the trick for me.

bb
_________________
Desktop: Ryzen 5 5600G, 32GB, 2TB, RX7600
Notebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Back to top
View user's profile Send private message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Tue Sep 02, 2008 2:20 pm    Post subject: Reply with quote

The forward delay and stp settings are default in current gentoo Xen installs. The hello interval relates to the frequency that bpdu is issued, and so it is unlikely to have any baring on my issue.

I'll give it a test at some point, because right now I have exhausted the leads available to me - at least, the ones I can think of. Well I do have another, and that is to build Xen with a stock kernel from another distribution, to see if it helps. But that represents a whole new set of difficulties, as I couldn't find a stock kernel that did everything I wanted, which is why I came back to gentoo...
Back to top
View user's profile Send private message
maslo64
n00b
n00b


Joined: 04 Sep 2008
Posts: 14
Location: Slovakia

PostPosted: Thu Sep 04, 2008 10:17 pm    Post subject: Reply with quote

Hello Koan,
I have exactly same issue with Xen.When I am starting domU I noticed message:
Code:

Bringing up eth0
 *     dhcp
 *       Running dhcpcd ...err, eth0: Failed to lookup hostname via DNS: Name or service not known
                                               [ ok ]
 *       eth0 received address 192.168.1.122/24

After login to system is everything fine, but when i am doing something like "emerge -eauDN world" some packages are transfered to domU and after while it looks that bridge is down, and again after while network intercase is working again .

Below is my bonding setup for eth0 and eth1 in dom0

Code:

   config_eth0=( "null" )
   config_eth1=( "null" )
   RC_NEED_bond0=("net.eth0 net.eth1")
   slaves_bond0="eth0 eth1"
   config_bond0=( "null" )

   RC_NEED.xenbr0="net.bond0"

   bridge_xenbr0="bond0"

bridge_xenbr0="bond0"
config_xenbr0=("dhcp")
brctl_xenbr0=(
        "setfd 0"
        "sethello 0"
        "stp off"
)


I am now trying to test if it`s not caused by bonding or ipv6.

Any help help will be appreciated
Back to top
View user's profile Send private message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Thu Sep 04, 2008 10:35 pm    Post subject: Reply with quote

Hi,

I am not using bonded nics, or IPv6.

Xen 3.3 came into unstable a couple of days ago, so I upgraded, but the problem still remains.

Someone on the Xensource mailing list suggested lowering the NIC rate so that the domU never transfers at a speed that breaks networking, but last time I tried to test, the break happened at 3.6Mbs. That is pretty slow!

So the changes I have made are:

1) Change domU kernel (2.6.21, 2.6.24, 2.6.25)
2) Change domU userland (gentoo, ubuntu)
3) Change dom0 physical nic + driver (Realtek 8169 -> 8168)
4) Change Xen version (3.2.1 -> 3.3)

The only thing I haven't changed is the dom0 kernel. I am using a stock 2.6.21 gentoo kernel, so it would be great if anyone watching this that has pv domUs working under a 2.6.21 kernel would post their .config so I can compare it to mine.

Paul
Back to top
View user's profile Send private message
maslo64
n00b
n00b


Joined: 04 Sep 2008
Posts: 14
Location: Slovakia

PostPosted: Fri Sep 05, 2008 10:02 am    Post subject: Reply with quote

Hmm, so I switched back to eth0 -> xenbr0 configuration and disabled IPV6 and everything is ok now.
I am going to try different bonding modes. And if the problem persist i thing I have to try NAT :(
Back to top
View user's profile Send private message
bbgermany
Veteran
Veteran


Joined: 21 Feb 2005
Posts: 1844
Location: Oranienburg/Germany

PostPosted: Fri Sep 05, 2008 10:09 am    Post subject: Reply with quote

What kind of bond do ya use? Maybe your switch doesnt support the mode and so packages get lost at transfer.

bb
_________________
Desktop: Ryzen 5 5600G, 32GB, 2TB, RX7600
Notebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Back to top
View user's profile Send private message
maslo64
n00b
n00b


Joined: 04 Sep 2008
Posts: 14
Location: Slovakia

PostPosted: Fri Sep 05, 2008 10:27 am    Post subject: Reply with quote

I was using mode=1 , but when I was testing pluging-> unplugin cables , connections was not restored.
Then I tried mode=0 which was working fine from dom0 , but issue with domU appear.
Back to top
View user's profile Send private message
bbgermany
Veteran
Veteran


Joined: 21 Feb 2005
Posts: 1844
Location: Oranienburg/Germany

PostPosted: Fri Sep 05, 2008 10:57 am    Post subject: Reply with quote

Did you try as mode 5 (balance-tlb) or 6 (balance-alb) as well? Mode 0 is round-robbing and 1 is active-backup. If youre switch supports Link Aggregation Control Protocol (LACP), you should consider mode 4 (802.3ad).

bb
_________________
Desktop: Ryzen 5 5600G, 32GB, 2TB, RX7600
Notebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Back to top
View user's profile Send private message
maslo64
n00b
n00b


Joined: 04 Sep 2008
Posts: 14
Location: Slovakia

PostPosted: Fri Sep 05, 2008 12:41 pm    Post subject: Reply with quote

I am going to test this today in the evening , because i don`t have console access to server and try again and again and again 8) .
Back to top
View user's profile Send private message
maslo64
n00b
n00b


Joined: 04 Sep 2008
Posts: 14
Location: Slovakia

PostPosted: Sat Sep 06, 2008 10:20 am    Post subject: Reply with quote

Still no luck with bonding , I tried 6 modes for bonding , but still no progres. I am going to downgrade kernels for dom0 and domU from 2.6.21 -> 21.6.18-r12 and check if it helps. I also set "sethello 2" as its known bug for xen as I found and you have right about this.
Back to top
View user's profile Send private message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Sat Sep 06, 2008 2:17 pm    Post subject: Reply with quote

Ok,

It looks like the bonding issue isn't related to the original issue report on this thread, but thats ok, we can share ;)

Anyway, in an effort to eliminate the nic as the source of the problem I used another with different drivers, but the problem remained. They were both realtek however, and it was pointed out that this would not necessarily discount a driver issue.

I am testing with a 10/100 tulip card and it is looking promising. No lockups yet and I have shifted a couple of gigs across the link.
Back to top
View user's profile Send private message
maslo64
n00b
n00b


Joined: 04 Sep 2008
Posts: 14
Location: Slovakia

PostPosted: Sat Sep 06, 2008 3:41 pm    Post subject: Reply with quote

I am sorry if I mess up your thread with my own problem :)
Anyway, I looks that I have reached the solution. My configuration is HP DL 380G5 and network card is :

03:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet (rev 12)

After i recompiled kernel with driver for NW card as modul, it reported "Call trace" to dmesg log.
I tried to compile and install drivers from Broadcom website ,but can`t (don`t know how, all my attempts was unsuccessfull) howto enable
ZLIB_INFLATE in kernel, thus I can`t load the module.

Everythinkg looks fine when I compiled it as kernel module thru 'make0 menuconfig' and set mode=5

So this are my configs:
Code:

master ~ # cat /etc/conf.d/net
   config_eth0=( "null" )
   config_eth1=( "null" )
   RC_NEED_bond0=("net.eth0 net.eth1")
   slaves_bond0="eth0 eth1"
   config_bond0=( "null" )

   RC_NEED_xenbr0="net.bond0"

   bridge_xenbr0="bond0"

 config_xenbr0=("dhcp")
 brctl_xenbr0=(
        "setfd 0"
        "sethello 2"
        "stp off"
)

Code:

master ~ # cat /etc/modules.autoload.d/kernel-2.6
bnx2
bonding miimon=100 mode=5
loop max_loop=256
master ~ # uname -a
Linux 2.6.18-xen-r12 #9 SMP Sat Sep 6 14:41:34 CEST 2008 x86_64 Intel(R) Xeon(R) CPU E5420 @ 2.50GHz GenuineIntel GNU/Linux
master ~ # cat /xen/reference/gentoo.xen.cfg
kernel = "/boot/vmlinuz-2.6.21-xenU"
memory = 1024
name = "reference"
vif = [ 'mac=00:16:3E:6A:49:54, bridge=xenbr0'  ]
dhcp = "dhcp"
disk = ['file:/xen/reference/gentoo64.img,sda1,w', \
        'file:/xen/reference/gentoo64_lvm.img,sdc1,w', \
        'file:/xen/reference/swap_disk.img,sds1,w', ]
root = "/dev/sda1 ro"
extra = "gentoo=nodevfsi"
master ~ #

app-emulation/xen-3.3.0
app-emulation/xen-tools-3.3.0


For me it looks that somewhere in /usr/src/linux/drivers/net/bonding/* isn`t everyting right when using bnx2.
Back to top
View user's profile Send private message
Hibbelharry
Tux's lil' helper
Tux's lil' helper


Joined: 27 May 2003
Posts: 88
Location: Bremen, Northern Germany

PostPosted: Sun Sep 07, 2008 7:49 pm    Post subject: Reply with quote

You might try disabling checksum offloading to hardware by using ethtool. This solved some network dying problems wit xen for me.

Greetz
Hibbelharry
Back to top
View user's profile Send private message
maslo64
n00b
n00b


Joined: 04 Sep 2008
Posts: 14
Location: Slovakia

PostPosted: Mon Sep 08, 2008 6:33 am    Post subject: Reply with quote

Hello Hibbelharry,
My problem is solved as I can tell now. I can`t add [solved] to topis as this isn`t my thread and I mess up koan's thread :)
btw. koan helped switching to diferent drivers ?
Back to top
View user's profile Send private message
bbgermany
Veteran
Veteran


Joined: 21 Feb 2005
Posts: 1844
Location: Oranienburg/Germany

PostPosted: Mon Sep 08, 2008 8:39 am    Post subject: Reply with quote

Hibbelharry wrote:
You might try disabling checksum offloading to hardware by using ethtool. This solved some network dying problems wit xen for me.

Greetz
Hibbelharry


Im having checksum offload disabled for tx not rx. Did you disable both?

Code:

zeus ~ # ethtool -k eth1
Offload parameters for eth1:
Cannot get device udp large send offload settings: Operation not supported
rx-checksumming: on
tx-checksumming: off
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
zeus ~ #


bb
_________________
Desktop: Ryzen 5 5600G, 32GB, 2TB, RX7600
Notebook: Dell XPS 13 9370, 16GB, 1TB
Server #1: Ryzen 5 Pro 4650G, 64GB, 16.5TB
Server #2: Ryzen 4800H, 32GB, 22TB
Back to top
View user's profile Send private message
BlackEye
l33t
l33t


Joined: 04 Dec 2002
Posts: 756
Location: Germany

PostPosted: Fri Oct 17, 2008 6:50 am    Post subject: Reply with quote

Is there a solution for this problem?
I have exact the same problem as the original poster!

Instead of samba I discovert the problem by using nfs. Copying large files (several MBs) over NFS and the virtual network of xen is unrecoverable dying. I need to restart the whole dom0 to be able to restart the domU and using the network again.
By reduceing the rsize and wsize of nfs I observed that this problem my not appear again. However - this could be related to the fact that the transmission is slower with these changes and maybe the bug isn't affected then. The strange thing is, that I could copy large files using netcat between domU and dom0 without any problems.
I'm afraid that this is a security issue. I could lower the rsize and wsize but what happens if one is able to send a large packet though the pipe to crash my connection.

Is there any real solution for this problem? New kernels? New bugfixes? Or any other hints?

I use xen 3.3 with 2.6.21-xen kernel sources (dom0 and domU).
Any help would be really appreciated!

Greetings,
Martin
Back to top
View user's profile Send private message
koan
Apprentice
Apprentice


Joined: 01 May 2006
Posts: 169
Location: Melbourne

PostPosted: Fri Oct 17, 2008 7:54 am    Post subject: Reply with quote

I am currently running with a non-Realtek based 10/100 card, and haven't experienced any issues with the network failing even at max.

However, I do want this to be a gigabit connection, so I have a dlink gig card waiting to test, and I'll report back if I get good results (or not).

My other plan is to use the SUSE Xen patchset against the Gentoo 2.6.25 kernel to see if that helps. I have the kernel built, but it remains to be seen if it even boots - other people have working Gentoo installs with this mix of kernel, but I don't know whether this will address the networking problem.

What network card are you using?
Back to top
View user's profile Send private message
BlackEye
l33t
l33t


Joined: 04 Dec 2002
Posts: 756
Location: Germany

PostPosted: Fri Oct 17, 2008 9:59 am    Post subject: Reply with quote

koan wrote:
What network card are you using?

02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)

The Realtek Ethernet Controller seems to have some issues with xen. I read something about this on the net. Unfortunately I can't change the NIC because this is a root-server which I haven't any direct access to.

About the kernel source I use, maybe this link is interesting for you too -> https://forums.gentoo.org/viewtopic-t-709908.html
There you can get a new vanilla with xen patches. This is the kernel I currently use on my dom0.
If I use NFS with this kernel and without setting rsize and wsize I got horrible transferrates. If I manually set rsize and wsize to 8192 I got a vast better result (you can see my post in the other thread about this).

However - I dont know if this is the real solution for this problem or not.

About the NIC: Although I found some issues about the realtek in conjunction with xen - I really don't know why this should have anything to do with it because the whole transfer between dom0 and the domUs are getting over the virtual devices..
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum