View previous topic :: View next topic |
Author |
Message |
prettyflyfora- n00b
Joined: 09 Jul 2021 Posts: 23
|
Posted: Sat Jan 01, 2022 4:18 am Post subject: amdgpu.dpm=0 causes freeze on boot, amdgpu.dpm=1 crashes X |
|
|
Stuck between a rock and a hard place, as the title says if dpm=0 is set in the kernel launch options then it will freeze on boot.
The only reason I have dpm disabled is it was a fix posted in https://bugzilla.kernel.org/show_bug.cgi?id=201957
Is there any way to fix either of these? It's killing me. Figurativly. _________________ -whiteguy |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54809 Location: 56N 3W
|
Posted: Sat Jan 01, 2022 12:07 pm Post subject: |
|
|
prettyflyfora-,
Post your thats a description of your hardware.
Post the output of
Don't start X and use wgetpaste to put your dmesg onto a pastebin.
Its far too big for a post. Post the link.
You can put everything into pastebins if you prefer. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
prettyflyfora- n00b
Joined: 09 Jul 2021 Posts: 23
|
Posted: Tue Jan 11, 2022 10:47 pm Post subject: |
|
|
I'm very sorry for the late reply, here's all that you asked for
dmesg: https://pastebin.com/8z20z1fc
lspci: https://pastebin.com/kk8bycCV
emerge: https://pastebin.com/ZwGiqDpK
I can't get the kernel log or dmesg of the freeze, due to the fact it freezes before any of my daemons start.
It for sure is the option amdgpu.dpm=0 causing the freeze, and it started in kernel 5.15.11
I could downgrade but I worry that I'll be stuck using old kernel versions until support is dropped. _________________ -whiteguy |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6214 Location: Dallas area
|
Posted: Tue Jan 11, 2022 11:20 pm Post subject: |
|
|
wgetpaste your kernel .config file and which version of X is installed _________________ UM780, 6.12 zen kernel, gcc 13, openrc, wayland |
|
Back to top |
|
|
prettyflyfora- n00b
Joined: 09 Jul 2021 Posts: 23
|
Posted: Tue Jan 11, 2022 11:45 pm Post subject: |
|
|
.config file: https://pastebin.com/f1tfZFEH
xorg-server: 1.20.14
xorg-drivers: 1.20-r2
xorg-proto: 2021.5
I'm confident it's not xorg itself that causes the GPU restart as it only happens on particular games, some of which don't crash at all.
EDIT: Kernel 5.10.76-r1 boots with amdgpu.dpm=0
2: images of when the freezes occer on the other kernels https://imgur.com/a/QKchMux _________________ -whiteguy |
|
Back to top |
|
|
prettyflyfora- n00b
Joined: 09 Jul 2021 Posts: 23
|
Posted: Mon Jan 24, 2022 11:00 pm Post subject: |
|
|
I isolated the cause of the freeze, or at least the error message it spits out.
Code: | Jan 24 16:33:05 [kernel] [ 2.572474] Loading firmware: amdgpu/navi10_pfp.bin
Jan 24 16:33:05 [kernel] [ 2.572475] Loading firmware: amdgpu/navi10_me.bin
Jan 24 16:33:05 [kernel] [ 2.572476] Loading firmware: amdgpu/navi10_ce.bin
Jan 24 16:33:05 [kernel] [ 2.572477] Loading firmware: amdgpu/navi10_rlc.bin
Jan 24 16:33:05 [kernel] [ 2.572477] Loading firmware: amdgpu/navi10_mec.bin
Jan 24 16:33:05 [kernel] [ 2.572478] Loading firmware: amdgpu/navi10_mec2.bin
Jan 24 16:33:05 [kernel] [ 2.572968] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: discard. Quota mode: none.
Jan 24 16:33:05 [kernel] [ 2.573030] Loading firmware: amdgpu/navi10_sdma.bin
Jan 24 16:33:05 [kernel] [ 2.573032] Loading firmware: amdgpu/navi10_sdma1.bin
Jan 24 16:33:05 [kernel] [ 2.573071] Loading firmware: amdgpu/navi10_vcn.bin
Jan 24 16:33:05 [kernel] [ 2.573072] [drm] Found VCN firmware Version ENC: 1.14 DEC: 5 VEP: 0 Revision: 20
Jan 24 16:33:05 [kernel] [ 2.573075] amdgpu 0000:28:00.0: amdgpu: Will use PSP to load VCN firmware
Jan 24 16:33:05 [kernel] [ 2.747244] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
Jan 24 16:33:05 [kernel] [ 2.785931] amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jan 24 16:33:05 [kernel] [ 2.790137] amdgpu 0000:28:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jan 24 16:33:05 [kernel] [ 2.790138] amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jan 24 16:33:05 [kernel] [ 2.790140] amdgpu: smu firmware loading failed
Jan 24 16:33:05 [kernel] [ 2.790141] amdgpu 0000:28:00.0: amdgpu: amdgpu_device_ip_init failed
Jan 24 16:33:05 [kernel] [ 2.790143] amdgpu 0000:28:00.0: amdgpu: Fatal error during GPU init
Jan 24 16:33:05 [kernel] [ 2.790144] amdgpu 0000:28:00.0: amdgpu: amdgpu: finishing device.
Jan 24 16:33:05 [kernel] [ 2.793726] [drm] free PSP TMR buffer
Jan 24 16:33:05 [kernel] [ 2.825874] amdgpu: probe of 0000:28:00.0 failed with error -95
Jan 24 16:33:05 [kernel] [ 2.825951] BUG: unable to handle page fault for address: ffffa4af5100d000
Jan 24 16:33:05 [kernel] [ 2.825954] #PF: supervisor write access in kernel mode
Jan 24 16:33:05 [kernel] [ 2.825955] #PF: error_code(0x0002) - not-present page
Jan 24 16:33:05 [kernel] [ 2.825957] PGD 100000067 P4D 100000067 PUD 100104067 PMD 0
Jan 24 16:33:05 [kernel] [ 2.825960] Oops: 0002 [#1] SMP NOPTI
Jan 24 16:33:05 [kernel] [ 2.825962] CPU: 6 PID: 759 Comm: systemd-udevd Not tainted 5.15.16-gentoo #8
Jan 24 16:33:05 [kernel] [ 2.825965] Hardware name: Micro-Star International Co., Ltd MS-7B86/B450 GAMING PLUS MAX (MS-7B86), BIOS H.60 04/18/2020
Jan 24 16:33:05 [kernel] [ 2.825967] RIP: 0010:vcn_v2_0_sw_fini+0x65/0x80 [amdgpu]
Jan 24 16:33:05 [kernel] [ 2.826139] Code: 89 ef e8 fe 1b ff ff 85 c0 75 08 48 89 ef e8 42 1a ff ff 48 8b 54 24 08 65 48 2b 14 25 28 00 00 00 75 18 48 83 c4 10 5b 5d c3 <c7> 03 00 00 00 00 8b 7c 24 04 e8 4c c4 4d e9 eb bc e8 15 cd ab e9
Jan 24 16:33:05 [kernel] [ 2.826142] RSP: 0018:ffffa4af40bc7c30 EFLAGS: 00010202
|
In the middle of these 30 lines, you can see the AMDGPU module failing to load smu firmware and dying ungracefully.
At this point my screen is frozen, and the only way to try booting again is to hold down my power button.
I know that amdgpu.dpm=0 is the cause, and removing that launch option would fix it, but if I remove that option my screen will freeze anyway in xorg or whatever game or anything like that
It's a ticking timebomb of unsolvable catch 22s _________________ -whiteguy |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6214 Location: Dallas area
|
Posted: Mon Jan 24, 2022 11:40 pm Post subject: |
|
|
Code: | dpm (int)
Override for dynamic power management setting (0 = disable, 1 = enable) The default is -1 (auto).
fw_load_type (int)
Set different firmware loading type for debugging (0 = direct, 1 = SMU, 2 = PSP). The default is -1 (auto).
aspm (int)
To disable ASPM (1 = enable, 0 = disable). The default is -1 (auto, enabled). |
I would play with these options, dpm=0 AND aspm=0, just to see if it affects anything.
I'd also play with the fw_load_type setting to see what happens, it seems to have problems with option 2 (PSP)
https://www.kernel.org/doc/html/v4.20/gpu/amdgpu.html -- amdgpu options, some others you might try if the above doesn't fix things. _________________ UM780, 6.12 zen kernel, gcc 13, openrc, wayland |
|
Back to top |
|
|
prettyflyfora- n00b
Joined: 09 Jul 2021 Posts: 23
|
Posted: Tue Jan 25, 2022 12:09 am Post subject: |
|
|
I cycled though the combos of those 3 and no luck.
Something however, worked earlier today.
I somehow got everything to click and it booted fine, then something changed and it stopped again
I don't know what changed, because even when I boot into the backup of the working config I had, it doesn't work anymore
Something changed and I have no idea what.
What could have possibly changed? I didn't even install packages or anything.
I'm going to re-emerge linux-firmware after manually deleting the old files.
EDIT: Didn't work
It's the sound drivers all over again. _________________ -whiteguy |
|
Back to top |
|
|
|