Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Gentoo on Hawk/Raptor
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo on ARM
View previous topic :: View next topic  
Author Message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54605
Location: 56N 3W

PostPosted: Sat Nov 05, 2022 5:53 pm    Post subject: Gentoo on Hawk/Raptor Reply with quote

Team,

I have one of these Hawks. It's what a Raspberry Pi wants to be when it grows up :)
All appears well if I run in on a 5.15.x kernel. 5.16.x to 5.19.x all generate RCU grace period timeouts when they have been up for between 30 min and 26 days.
The Raptor in the title uses the same CPU but has the second memory channel fitted.

I've not seen the RCU grace period timeouts on 6.0.x yet but it doesn't run very long before the kernel panics and make a mess on the console. The console is serial over LAN, thanks to the board management computer.
Code:
# [80368.354113] Internal error: Oops: 96000004 [#1] SMP
[80368.366624] Modules linked in: vhost_net vhost vhost_iotlb tap tun i2c_dev crct10dif_ce
[80368.382250] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.0.7-gentoo #1
[80368.396238] Hardware name: MiTAC HAWK EV-883832-X3-0001/HAWK, BIOS 1.2 06/27/2020
[80368.411319] pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[80368.425924] pc : aer_irq+0x178/0x230
[80368.437064] lr : aer_irq+0xbc/0x230
[80368.448016] sp : ffff800008003ed0
[80368.458723] x29: ffff800008003ed0 x28: ffff8000096b8000 x27: 0000009fe650a000
[80368.473320] x26: 0000009fe650a000 x25: ffff8000092499f8 x24: ffff80000980b1ee
[80368.487843] x23: ffff0008073db400 x22: 0000000000000100 x21: 0000000000000130
[80368.502301] x20: ffff0008074a0080 x19: ffff000802223000 x18: 0000000000000000
[80368.516689] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000004000
[80368.531015] x14: 0000000000027100 x13: 0000000001c76924 x12: 003d0900f29fa5bd
[80368.545311] x11: 0000000000000000 x10: 0000000100000008 x9 : 0000000001c76924
[80368.559587] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[80368.573797] x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
[80368.587917] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[80368.601914] Call trace:
[80368.611069]  aer_irq+0x178/0x230
[80368.620864]  __handle_irq_event_percpu+0x5c/0x17c
[80368.632011]  handle_irq_event+0x4c/0x180
[80368.642321]  handle_fasteoi_irq+0xbc/0x270
[80368.652628]  generic_handle_domain_irq+0x3c/0x6c
[80368.663309]  gic_handle_irq+0x6c/0xfc
[80368.672969]  call_on_irq_stack+0x2c/0x38
[80368.682729]  do_interrupt_handler+0xa4/0xb0
[80368.692575]  el1_interrupt+0x34/0x64
[80368.701625]  el1h_64_irq_handler+0x18/0x34
[80368.711012]  el1h_64_irq+0x68/0x6c
[80368.719567]  cpuidle_enter_state+0x130/0x36c
[80368.728920]  cpuidle_enter+0x38/0x60
[80368.737459]  cpuidle_idle_call+0x134/0x190
[80368.746506]  do_idle+0xac/0x110
[80368.754570]  cpu_startup_entry+0x28/0x30
[80368.763398]  kernel_init+0x0/0x140
[80368.771660]  arch_post_acpi_subsys_init+0x0/0x18
[80368.781137]  start_kernel+0x498/0x4f8
[80368.789614]  __primary_switched+0xbc/0xc4
[80368.798350] Code: 17ffffbc d2800001 d3410400 b94037e3 (3940b822)
[80368.809150] ---[ end trace 0000000000000000 ]---
[80368.818386] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[80368.829893] SMP: stopping secondary CPUs
[80368.839470] Kernel Offset: 0x120000 from 0xffff800008000000
[80368.849552] PHYS_OFFSET: 0x80000000
[80368.857590] CPU features: 0x0000,00045021,00001086
[80368.866911] Memory Limit: none
[80368.874399] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---


I don't think its hardware. If it were, why does 5.15.x appear to work?
Also, the Mudan, which is like a cut down Hawk, has problems with 5.16.x kernels. My Mudan was retired in favour of the Hawk about that time.
They have a lot in common though. The Mudan is an X-Gene 1 CPU the Hawk is an X-Gene 3 CPU, which like four X-Gene 1's in the same package.

It seems to independent of load and CUP temperature

Thoughts, ideas, questions and hints at how and what to bisect would be appreciated.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1304
Location: Richmond Hill, Canada

PostPosted: Sat Nov 05, 2022 7:24 pm    Post subject: Reply with quote

Neddy,

I couldn't tell from your posted dmesg output if this is RCU problem. however when I search Kernel document tree I found Using RCU’s CPU Stall Detector may be offer some help.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54605
Location: 56N 3W

PostPosted: Sat Nov 05, 2022 7:26 pm    Post subject: Reply with quote

pingtoo,

Its not the RCU stall, at least I don't think it is.
Code:
Kernel panic - not syncing: Oops: Fatal exception in interrupt
Its from the 6.0.7 kernel.

Thank you for that link.

RCU stalls look like ... http://0x0.st/oEK4.txt
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo on ARM All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum