View previous topic :: View next topic |
Author |
Message |
bbethke n00b

Joined: 13 Jan 2005 Posts: 4
|
Posted: Sun Jan 07, 2007 4:43 am Post subject: smp related kernel oops in 2.6.18/19 [solved] |
|
|
Hello all,
I am having a problem achieving a stable configuration for two different Core2Duo based systems, a Dell Optiplex 745 and a Shuttle g2 3200b. Both systems have a Intel Core 2 Duo E6400 processor. I installed the 32-bit flavor of gentoo with the following make.conf options:
Code: |
CFLAGS="-march=prescott -O2 -pipe -fomit-frame-pointer"
CXXFLAGS="${CFLAGS}"
|
The machines are used as headless servers, and the only software they run aside from ssh is a control application that sends data out over a USB serial port using the cypress_m8 USB-to-serial driver.
The problem is that after about 30 minutes, the computer will crash with a kernel panic:
Code: |
Oops: 0000 [#1]
PREEMPT SMP
Modules linked in: cypress_m8 usbserial
CPU: 0
EIP: 0060:[<c02891e5>] Not tainted VLI
EFLAGS: 00010002 (2.6.18-gentoo-r4 #4)
EIP is at uhci_scan_schedule+0x257/0x721
eax: 00000000 ebx: f7ee15c4 ecx: 00000000 edx: f7eef5b8
esi: f7eef5a0 edi: f7eef5a0 ebp: c22f68c8 esp: c03b3ef4
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c03b2000 task=c0355b40 task.ti=c03b2000)
Stack: 00000000 c03b3fa4 00000009 00000000 f7eef5b8 00320432 00000000 c21bd8dc
f7ee15c0 c21bd8f0 00000016 3a146000 c21bd8dc 00000202 c22f6918 c22f68c8
c22f6800 c028addb c03b3fa4 268c11a9 efca5667 c22f6800 00000000 00000000
Call Trace:
[<c028addb>] uhci_irq+0x118/0x12e
[<c027c1c5>] usb_hcd_irq+0x23/0x50
[<c01304a7>] handle_IRQ_event+0x23/0x49
[<c0130561>] __do_IRQ+0x94/0xec
[<c0104c55>] do_IRQ+0x43/0x52
[<c010329e>] common_interrupt+0x1a/0x20
[<c0101554>] mwait_idle+0x26/0x39
[<c010150b>] cpu_idle+0x5f/0x82
[<c03b76af>] start_kernel+0x32b/0x332
Code: 54 24 2c 8b 4c 24 14 83 c1 14 89 4c 24 18 8b 04 24 83 c0 10 39 c1 0f 85 fe fe ff ff 8b 7f 40 89 7c 24 0c e9 0e 02 00 00 8b 04 24 <8b> 78 0c 8b 50 10 83 ea 14 89 54 24 1c 8b 4a 14 83 e9 14 89
EIP: [<c02891e5>] uhci_scan_schedule+0x257/0x721 SS:ESP 0068:c03b3ef4
<0>Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp.c:547/smp_call_function()
[<c010accb>] smp_call_function+0x54/0x103
[<c0117c94>] printk+0x14/0x18
[<c010ad8d>] smp_send_stop+0x13/0x1c
[<c01172d1>] panic+0x4f/0xe2
[<c0103bd1>] die+0x23c/0x270
[<c010fd33>] do_page_fault+0x3a6/0x46c
[<c010f98d>] do_page_fault+0x0/0x46c
[<c01033bd>] error_code+0x39/0x40
[<c02891e5>] uhci_scan_schedule+0x257/0x721
[<c028addb>] uhci_irq+0x118/0x12e
[<c027c1c5>] usb_hcd_irq+0x23/0x50
[<c01304a7>] handle_IRQ_event+0x23/0x49
[<c0130561>] __do_IRQ+0x94/0xec
[<c0104c55>] do_IRQ+0x43/0x52
[<c010329e>] common_interrupt+0x1a/0x20
[<c0101554>] mwait_idle+0x26/0x39
[<c010150b>] cpu_idle+0x5f/0x82
[<c03b76af>] start_kernel+0x32b/0x332
|
I have tried a number of configurations in an attempt to resolve this problem, including:
- Compliling cypress_m8 into the kernel (as opposed to using it as a module)
- Trying different versions of the kernel (2.6.18-gentoo-r4 and vanilla 2.6.19.1)
- Trying different values of "Processor family" in the kernel (Pentium-4/Celeron(P4-based)/Pentium-4 M/Xeon and 386)
- Trying the above on both machines
The problem occurs in every configuration I have tried so far, always with the same type of kernel panic.
The machines are brand new, so I don't suspect a hardware problem.
Does anyone know what might be causing this? I can keep trying different things but since it takes ~30 minutes to see if it worked, it's slow going.
Thanks very much in advance for your help! 
Last edited by bbethke on Tue Jan 09, 2007 1:42 am; edited 1 time in total |
|
Back to top |
|
 |
PantsMan n00b


Joined: 09 Jul 2004 Posts: 36
|
Posted: Sun Jan 07, 2007 6:45 am Post subject: |
|
|
I think I read on some other thread of other people having problems with USB on new kernels, ie like 2.6.18, 2.6.19. So I think its definitely worth giving say, 2.6.16 a try.
But also, your oops/panic occurred while processing an interrupt... And, other ppl also seem to be having various problems with how interrupts are handled in SMP kernels these days... so... if you want to stay with 2.6.18 or 2.6.19 - try disabling SMP. Im sure your machines can still do their job without it. You've probably already thought of doing that - but dont want to - as its not really an ideal solution. But if all these boxes are doing is some USB stuff, they dont need SMP using both cores, and should get the job done fine without it.
Ps - beware passing the nosmp parameter to an SMP kernel on boot. Its supposed to just disable SMP, but, it didnt work for me. it gave me "hda: lost interrupt" messages and pc failed to boot. I had to recompile a new kernel with all SMP disabled.
You could also try disabling APIC, with SMP on (and off, if you have to). It will change how your IRQs are allocated, handled etc, and may help.
Im really no guru in this area, just giving my 2 cents  |
|
Back to top |
|
 |
bbethke n00b

Joined: 13 Jan 2005 Posts: 4
|
Posted: Tue Jan 09, 2007 1:42 am Post subject: |
|
|
I tried a 2.6.15-gentoo-r1 kernel, and both machines appear to be running quite stably now... So it appears that the problem was indeed caused by the newer kernels.
I don't really need anything in the new kernels, although I'll try new kernels as they're released just out of curiosity to see if any of them fix the problem. |
|
Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|