View previous topic :: View next topic |
Author |
Message |
rfolkerts n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 24 Jan 2008 Posts: 6
|
Posted: Fri Jan 30, 2009 4:52 pm Post subject: Xen Dom0 crashes regularlay since Kernel 2.6.18-xen-r12 |
|
|
Hi,
in December we updated our Gentoo-Xen Dom0-Machine; amongst these Updates was the latest Xen 3.3.0.
After this Update we booted the "old" 2.6.21-Xen Kernel. It did (and does) boot fine but after running a few minutes the System loses Network; there is no message in /var/log/messages or dmesg, but neither the Hypervisor nor it's DomUs can be reached (ping, ssh). We also were unable to compile the 2.6.21, as there seems to be a Problem with installed Header-Files.
No Problem, as the 2.6.21 was "deprecated" we choose the 2.6.18 (r12) which compiled and booted 1a.
Unfortunately, since then this machine crashes every few days (see below for stack trace).
We tried to update world to a more recent GCC, following the Gentoo Documentation (from 3.4.6 to i686-pc-linux-gnu-4.1.2) -- but that didn't help.
Next we searched for the "swiotlb_map_sg" Crash-Point in the Net and found several references. The hints there where to add a "swiotlb=n" Kernel-Parameter.
From what we understand this Parameter controls the size of a Table that's being used by the Code for use as DMA Buffer.
However, we didn't find a "rule" explaining what value to use under which conditions.
So, we added "swiotlb=512".
(On a Novell-Site were hints that asked to set to 2, in some Forums/MailingLists people reported to have set it up to 4096).
However, the Problem still occurs.
Now, before we blindly set "swiotlb" to some unrealistic Values, does someone have a hint on what might be going on there? The System did run 1a rock-solid with Kernel 2.6.21, so I hope to get it somewhat stable again...
The System is a 16G Dual Xeon-Server with Intel MB (unfortunately) still running x86 w. PAE (we didn't change that sine we started with Xen 3.0 several Years ago, it definitely should be updated to x64 -- however, before that step it should just ran stable again). Disks are connected via a 3Ware 9550SX Controller. The System hosts 14 DomUs running x86/PAE Linux. It also runs a NFS-Server for sharing Data between the DomUs which these mount.
We didn't change enything re. this setup, i.e. this System "as is" just with older "world" ran 2a with Kernel 2.6.21.
Any hint would be really great!
Cheers,
_ralf_
Code: |
Oops: 0000 [#1]
SMP
Modules linked in: uhci_hcd ehci_hcd usbcore e1000
CPU: 0
EIP: 0061:[<c0109f45>] Not tainted VLI
EFLAGS: 00010002 (2.6.18-xen-r12 #5)
EIP is at range_straddles_page_boundary+0x30/0xee
eax: c04e0000 ebx: eb126000 ecx: 000eb126 edx: 000eb126
esi: 00000000 edi: 00002000 ebp: 00000003 esp: ecf25a9c
ds: 007b es: 007b ss: 0069
Process nfsd (pid: 3610, ti=ecf24000 task=ec09f550 task.ti=ecf24000)
Stack: 00000030 e27e9ec0 eb126000 00000000 0ff26000 00000003 c022ee00 00001000
00000000 00002000 00000000 00000002 e27e9ec0 ed744048 00000000 00000000
00000000 ed744048 00000002 da1c71c0 ed2ee880 c02cc553 00000000 00000000
Call Trace:
[<c022ee00>] swiotlb_map_sg+0x13c/0x26c
[<c02cc553>] twa_scsiop_execute_scsi+0x3a5/0x6e1
[<c02c0a8f>] scsi_done+0x0/0x16
[<c02cc8f7>] twa_scsi_queue+0x68/0xe3
[<c02c0ec4>] scsi_dispatch_cmd+0x130/0x210
[<c02c4e42>] scsi_request_fn+0x183/0x346
[<c021d15b>] __generic_unplug_device+0x1f/0x25
[<c021e100>] __make_request+0xee/0x370
[<c014014b>] mempool_alloc+0x1f/0xcb
[<c021c555>] generic_make_request+0xea/0x156
[<c0164d45>] bio_clone+0x28/0x2d
[<c02e99b3>] __map_bio+0x2e/0x73
[<c02ea357>] __split_bio+0x284/0x358
[<c02c0939>] scsi_finish_command+0x3c/0x40
[<c02ea5f1>] dm_request+0xbc/0xf2
[<c021c555>] generic_make_request+0xea/0x156
[<c0298f35>] evtchn_do_upcall+0xc7/0x1e2
[<c015c33c>] kmem_cache_alloc+0xb4/0xba
[<c021e57d>] submit_bio+0x6b/0x109
[<c016400c>] bio_alloc_bioset+0x78/0x134
[<c0160c11>] submit_bh+0xc0/0x10d
[<c0162425>] __block_write_full_page+0x1b0/0x328
[<c019ab2e>] ext3_get_block+0x0/0xcb
[<c0162859>] block_write_full_page+0xf8/0x100
[<c019ab2e>] ext3_get_block+0x0/0xcb
[<c019c37c>] ext3_ordered_writepage+0xe5/0x1ad
[<c019921d>] bget_one+0x0/0x7
[<c0147827>] dec_zone_page_state+0x30/0x5f
[<c017ff7c>] mpage_writepages+0x149/0x3a2
[<c019c297>] ext3_ordered_writepage+0x0/0x1ad
[<c0142b30>] do_writepages+0x35/0x37
[<c013df99>] __filemap_fdatawrite_range+0x66/0x72
[<c013e1cb>] filemap_fdatawrite+0x23/0x27
[<c01dfb1a>] nfsd_sync+0x3e/0x96
[<c01e0284>] nfsd_open+0xe4/0x132
[<c01e043e>] nfsd_commit+0x93/0xa7
[<c01e6ce1>] nfsd3_proc_commit+0xde/0xf7
[<c01dc6b2>] nfsd_dispatch+0x82/0x1b9
[<c0366e0a>] _spin_lock_bh+0x8/0x18
[<c0356e31>] svc_process+0x3de/0x6ba
[<c0366e0a>] _spin_lock_bh+0x8/0x18
[<c0359732>] svc_recv+0x3d7/0x4ad
[<c01dcc42>] nfsd+0x19e/0x32c
[<c01dcaa4>] nfsd+0x0/0x32c
[<c0102ac5>] kernel_thread_helper+0x5/0xb
Code: ec 08 89 c3 25 ff 0f 00 00 8d 3c 08 81 ff 00 10 00 00 77 0a 31 c0 83 c4 08
5b 5e 5f 5d c3 89 d9 0f ac d1 0c 89 ca a1 20 96 47 c0 <0f> a3 08 19 c0 85 c0 75
e0 0f b6 05 22 49 43 c0 88 44 24 07 89
EIP: [<c0109f45>] range_straddles_page_boundary+0x30/0xee SS:ESP 0069:ecf25a9c
|
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
trikolon Apprentice
![Apprentice Apprentice](/images/ranks/rank_rect_2.gif)
Joined: 04 Dec 2004 Posts: 297 Location: Erlangen
|
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
rfolkerts n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 24 Jan 2008 Posts: 6
|
Posted: Sat Jan 31, 2009 10:39 am Post subject: |
|
|
Hi,
wow, no! I was not aware of that Project; only looked for more up-to-date xen-kernels in Portage (and checked the Kernel-Log on Heise - OpenSource).
Will check that and give it a try! Will post my experience
Thanks!
_ralf_ |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
rfolkerts n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 24 Jan 2008 Posts: 6
|
Posted: Sun Mar 01, 2009 4:56 pm Post subject: |
|
|
Hi,
just a short update:
I did have a look at the Google-Gentoo-Xen-Kernel Project but was a bit reluctant to give it a try.
However, I remembered that with the Update to Xen 3.3 I removed the "dom0_mem" Line from Xen's Grub-Config.
So, I added -using the "old" Value- that line -- and the machine did not crash again yet (while it used to crash at least once a week w/o that Parameter it keeps running since a few weeks now).
The Parameter was (and now again is) set to: dom0_mem=262144
W/o that Paramter the Dom0-Machine did have ~1.8G RAM available.
Just write this here in case someone else runs into the same Problem!
Cheers,
_ralf_ |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
linuxtuxhellsinki l33t
![l33t l33t](/images/ranks/rank_rect_4.gif)
![](images/avatars/108586992641e91ddb64c5a.gif)
Joined: 15 Nov 2004 Posts: 700 Location: Hellsinki
|
Posted: Wed Mar 04, 2009 7:22 pm Post subject: |
|
|
I used to have some problems with "dynamic" memory in dom0 and e1000 nic, but they went away with static memory allocation. You can also use dom0_mem=256M with xen-3.* versions and it's easy to increase memory of dom0 with xm if needed. _________________ 1st use 'Search' & lastly add [Solved] to
the subject of your first post in the thread. |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
rfolkerts n00b
![n00b n00b](/images/ranks/rank_rect_0.gif)
Joined: 24 Jan 2008 Posts: 6
|
Posted: Wed Mar 04, 2009 8:02 pm Post subject: |
|
|
Hi,
thanks for the reply!
Well, I had in mind the "m" suffix but was to lazy to look it up (and as the machine kept crashing ~once a week and I would not have bet that the "solution" would help at all I just put in the old entry quickly). Nevertheless, thanks for pointing me to that!
Cheers,
_ralf_
(Much more relaxed as the Hypervisor uses to work rock-solid again). |
|
Back to top |
|
![](templates/gentoo/images/spacer.gif) |
|