View previous topic :: View next topic |
Author |
Message |
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54640 Location: 56N 3W
|
Posted: Sat Nov 05, 2022 5:53 pm Post subject: Gentoo on Hawk/Raptor |
|
|
Team,
I have one of these Hawks. It's what a Raspberry Pi wants to be when it grows up :)
All appears well if I run in on a 5.15.x kernel. 5.16.x to 5.19.x all generate RCU grace period timeouts when they have been up for between 30 min and 26 days.
The Raptor in the title uses the same CPU but has the second memory channel fitted.
I've not seen the RCU grace period timeouts on 6.0.x yet but it doesn't run very long before the kernel panics and make a mess on the console. The console is serial over LAN, thanks to the board management computer.
Code: | # [80368.354113] Internal error: Oops: 96000004 [#1] SMP
[80368.366624] Modules linked in: vhost_net vhost vhost_iotlb tap tun i2c_dev crct10dif_ce
[80368.382250] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.7-gentoo #1
[80368.396238] Hardware name: MiTAC HAWK EV-883832-X3-0001/HAWK, BIOS 1.2 06/27/2020
[80368.411319] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[80368.425924] pc : aer_irq+0x178/0x230
[80368.437064] lr : aer_irq+0xbc/0x230
[80368.448016] sp : ffff800008003ed0
[80368.458723] x29: ffff800008003ed0 x28: ffff8000096b8000 x27: 0000009fe650a000
[80368.473320] x26: 0000009fe650a000 x25: ffff8000092499f8 x24: ffff80000980b1ee
[80368.487843] x23: ffff0008073db400 x22: 0000000000000100 x21: 0000000000000130
[80368.502301] x20: ffff0008074a0080 x19: ffff000802223000 x18: 0000000000000000
[80368.516689] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000004000
[80368.531015] x14: 0000000000027100 x13: 0000000001c76924 x12: 003d0900f29fa5bd
[80368.545311] x11: 0000000000000000 x10: 0000000100000008 x9 : 0000000001c76924
[80368.559587] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[80368.573797] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
[80368.587917] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[80368.601914] Call trace:
[80368.611069] aer_irq+0x178/0x230
[80368.620864] __handle_irq_event_percpu+0x5c/0x17c
[80368.632011] handle_irq_event+0x4c/0x180
[80368.642321] handle_fasteoi_irq+0xbc/0x270
[80368.652628] generic_handle_domain_irq+0x3c/0x6c
[80368.663309] gic_handle_irq+0x6c/0xfc
[80368.672969] call_on_irq_stack+0x2c/0x38
[80368.682729] do_interrupt_handler+0xa4/0xb0
[80368.692575] el1_interrupt+0x34/0x64
[80368.701625] el1h_64_irq_handler+0x18/0x34
[80368.711012] el1h_64_irq+0x68/0x6c
[80368.719567] cpuidle_enter_state+0x130/0x36c
[80368.728920] cpuidle_enter+0x38/0x60
[80368.737459] cpuidle_idle_call+0x134/0x190
[80368.746506] do_idle+0xac/0x110
[80368.754570] cpu_startup_entry+0x28/0x30
[80368.763398] kernel_init+0x0/0x140
[80368.771660] arch_post_acpi_subsys_init+0x0/0x18
[80368.781137] start_kernel+0x498/0x4f8
[80368.789614] __primary_switched+0xbc/0xc4
[80368.798350] Code: 17ffffbc d2800001 d3410400 b94037e3 (3940b822)
[80368.809150] ---[ end trace 0000000000000000 ]---
[80368.818386] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[80368.829893] SMP: stopping secondary CPUs
[80368.839470] Kernel Offset: 0x120000 from 0xffff800008000000
[80368.849552] PHYS_OFFSET: 0x80000000
[80368.857590] CPU features: 0x0000,00045021,00001086
[80368.866911] Memory Limit: none
[80368.874399] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- |
I don't think its hardware. If it were, why does 5.15.x appear to work?
Also, the Mudan, which is like a cut down Hawk, has problems with 5.16.x kernels. My Mudan was retired in favour of the Hawk about that time.
They have a lot in common though. The Mudan is an X-Gene 1 CPU the Hawk is an X-Gene 3 CPU, which like four X-Gene 1's in the same package.
It seems to independent of load and CUP temperature
Thoughts, ideas, questions and hints at how and what to bisect would be appreciated. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1339 Location: Richmond Hill, Canada
|
Posted: Sat Nov 05, 2022 7:24 pm Post subject: |
|
|
Neddy,
I couldn't tell from your posted dmesg output if this is RCU problem. however when I search Kernel document tree I found Using RCU’s CPU Stall Detector may be offer some help. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54640 Location: 56N 3W
|
Posted: Sat Nov 05, 2022 7:26 pm Post subject: |
|
|
pingtoo,
Its not the RCU stall, at least I don't think it is. Code: | Kernel panic - not syncing: Oops: Fatal exception in interrupt | Its from the 6.0.7 kernel.
Thank you for that link.
RCU stalls look like ... http://0x0.st/oEK4.txt _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|