Xen Dom0 crashes regularlay since Kernel 2.6.18-xen-r12

rfolkerts · n00b Joined: 24 Jan 2008 Posts: 6

Hi,

in December we updated our Gentoo-Xen Dom0-Machine; amongst these Updates was the latest Xen 3.3.0.

After this Update we booted the "old" 2.6.21-Xen Kernel. It did (and does) boot fine but after running a few minutes the System loses Network; there is no message in /var/log/messages or dmesg, but neither the Hypervisor nor it's DomUs can be reached (ping, ssh). We also were unable to compile the 2.6.21, as there seems to be a Problem with installed Header-Files.

No Problem, as the 2.6.21 was "deprecated" we choose the 2.6.18 (r12) which compiled and booted 1a.

Unfortunately, since then this machine crashes every few days (see below for stack trace).

We tried to update world to a more recent GCC, following the Gentoo Documentation (from 3.4.6 to i686-pc-linux-gnu-4.1.2) -- but that didn't help.

Next we searched for the "swiotlb_map_sg" Crash-Point in the Net and found several references. The hints there where to add a "swiotlb=n" Kernel-Parameter.

From what we understand this Parameter controls the size of a Table that's being used by the Code for use as DMA Buffer.

However, we didn't find a "rule" explaining what value to use under which conditions.

So, we added "swiotlb=512".

(On a Novell-Site were hints that asked to set to 2, in some Forums/MailingLists people reported to have set it up to 4096).

However, the Problem still occurs.

Now, before we blindly set "swiotlb" to some unrealistic Values, does someone have a hint on what might be going on there? The System did run 1a rock-solid with Kernel 2.6.21, so I hope to get it somewhat stable again...

The System is a 16G Dual Xeon-Server with Intel MB (unfortunately) still running x86 w. PAE (we didn't change that sine we started with Xen 3.0 several Years ago, it definitely should be updated to x64 -- however, before that step it should just ran stable again). Disks are connected via a 3Ware 9550SX Controller. The System hosts 14 DomUs running x86/PAE Linux. It also runs a NFS-Server for sharing Data between the DomUs which these mount.

We didn't change enything re. this setup, i.e. this System "as is" just with older "world" ran 2a with Kernel 2.6.21.

Any hint would be really great!

Cheers,
_ralf_

trikolon · Posted: Sat Jan 31, 2009 9:34 am Post subject:

did you try this kernel too: http://code.google.com/p/gentoo-xen-kernel/downloads/list

rfolkerts · n00b Joined: 24 Jan 2008 Posts: 6

rfolkerts · n00b Joined: 24 Jan 2008 Posts: 6

Hi,

just a short update:

I did have a look at the Google-Gentoo-Xen-Kernel Project but was a bit reluctant to give it a try.

However, I remembered that with the Update to Xen 3.3 I removed the "dom0_mem" Line from Xen's Grub-Config.

So, I added -using the "old" Value- that line -- and the machine did not crash again yet (while it used to crash at least once a week w/o that Parameter it keeps running since a few weeks now).

The Parameter was (and now again is) set to: dom0_mem=262144

W/o that Paramter the Dom0-Machine did have ~1.8G RAM available.

Just write this here in case someone else runs into the same Problem!

Cheers,
_ralf_

linuxtuxhellsinki · l33t Joined: 15 Nov 2004 Posts: 700 Location: Hellsinki

I used to have some problems with "dynamic" memory in dom0 and e1000 nic, but they went away with static memory allocation. You can also use dom0_mem=256M with xen-3.* versions and it's easy to increase memory of dom0 with xm if needed.
_________________
1st use 'Search' & lastly add [Solved] to
the subject of your first post in the thread.

rfolkerts · n00b Joined: 24 Jan 2008 Posts: 6

Hi,

thanks for the reply!

Well, I had in mind the "m" suffix but was to lazy to look it up (and as the machine kept crashing ~once a week and I would not have bet that the "solution" would help at all I just put in the old entry quickly). Nevertheless, thanks for pointing me to that!

Cheers,
_ralf_
(Much more relaxed as the Hypervisor uses to work rock-solid again).