Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
amdgpu.dpm=0 causes freeze on boot, amdgpu.dpm=1 crashes X
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
prettyflyfora-
n00b
n00b


Joined: 09 Jul 2021
Posts: 23

PostPosted: Sat Jan 01, 2022 4:18 am    Post subject: amdgpu.dpm=0 causes freeze on boot, amdgpu.dpm=1 crashes X Reply with quote

Stuck between a rock and a hard place, as the title says if dpm=0 is set in the kernel launch options then it will freeze on boot.

The only reason I have dpm disabled is it was a fix posted in https://bugzilla.kernel.org/show_bug.cgi?id=201957

Is there any way to fix either of these? It's killing me. Figurativly.
_________________
-whiteguy
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54809
Location: 56N 3W

PostPosted: Sat Jan 01, 2022 12:07 pm    Post subject: Reply with quote

prettyflyfora-,

Post your
Code:
lspci -nnk
thats a description of your hardware.
Post the output of
Code:
emerge --info


Don't start X and use wgetpaste to put your dmesg onto a pastebin.
Its far too big for a post. Post the link.

You can put everything into pastebins if you prefer.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
prettyflyfora-
n00b
n00b


Joined: 09 Jul 2021
Posts: 23

PostPosted: Tue Jan 11, 2022 10:47 pm    Post subject: Reply with quote

I'm very sorry for the late reply, here's all that you asked for

dmesg: https://pastebin.com/8z20z1fc
lspci: https://pastebin.com/kk8bycCV
emerge: https://pastebin.com/ZwGiqDpK

I can't get the kernel log or dmesg of the freeze, due to the fact it freezes before any of my daemons start.

It for sure is the option amdgpu.dpm=0 causing the freeze, and it started in kernel 5.15.11
I could downgrade but I worry that I'll be stuck using old kernel versions until support is dropped.
_________________
-whiteguy
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6214
Location: Dallas area

PostPosted: Tue Jan 11, 2022 11:20 pm    Post subject: Reply with quote

wgetpaste your kernel .config file and which version of X is installed
_________________
UM780, 6.12 zen kernel, gcc 13, openrc, wayland
Back to top
View user's profile Send private message
prettyflyfora-
n00b
n00b


Joined: 09 Jul 2021
Posts: 23

PostPosted: Tue Jan 11, 2022 11:45 pm    Post subject: Reply with quote

.config file: https://pastebin.com/f1tfZFEH
xorg-server: 1.20.14
xorg-drivers: 1.20-r2
xorg-proto: 2021.5

I'm confident it's not xorg itself that causes the GPU restart as it only happens on particular games, some of which don't crash at all.


EDIT: Kernel 5.10.76-r1 boots with amdgpu.dpm=0
2: images of when the freezes occer on the other kernels https://imgur.com/a/QKchMux
_________________
-whiteguy
Back to top
View user's profile Send private message
prettyflyfora-
n00b
n00b


Joined: 09 Jul 2021
Posts: 23

PostPosted: Mon Jan 24, 2022 11:00 pm    Post subject: Reply with quote

I isolated the cause of the freeze, or at least the error message it spits out.
Code:
Jan 24 16:33:05 [kernel] [    2.572474] Loading firmware: amdgpu/navi10_pfp.bin
Jan 24 16:33:05 [kernel] [    2.572475] Loading firmware: amdgpu/navi10_me.bin
Jan 24 16:33:05 [kernel] [    2.572476] Loading firmware: amdgpu/navi10_ce.bin
Jan 24 16:33:05 [kernel] [    2.572477] Loading firmware: amdgpu/navi10_rlc.bin
Jan 24 16:33:05 [kernel] [    2.572477] Loading firmware: amdgpu/navi10_mec.bin
Jan 24 16:33:05 [kernel] [    2.572478] Loading firmware: amdgpu/navi10_mec2.bin
Jan 24 16:33:05 [kernel] [    2.572968] EXT4-fs (sdb1): mounted filesystem with ordered data mode. Opts: discard. Quota mode: none.
Jan 24 16:33:05 [kernel] [    2.573030] Loading firmware: amdgpu/navi10_sdma.bin
Jan 24 16:33:05 [kernel] [    2.573032] Loading firmware: amdgpu/navi10_sdma1.bin
Jan 24 16:33:05 [kernel] [    2.573071] Loading firmware: amdgpu/navi10_vcn.bin
Jan 24 16:33:05 [kernel] [    2.573072] [drm] Found VCN firmware Version ENC: 1.14 DEC: 5 VEP: 0 Revision: 20
Jan 24 16:33:05 [kernel] [    2.573075] amdgpu 0000:28:00.0: amdgpu: Will use PSP to load VCN firmware
Jan 24 16:33:05 [kernel] [    2.747244] [drm] reserve 0x900000 from 0x817e400000 for PSP TMR
Jan 24 16:33:05 [kernel] [    2.785931] amdgpu 0000:28:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jan 24 16:33:05 [kernel] [    2.790137] amdgpu 0000:28:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jan 24 16:33:05 [kernel] [    2.790138] amdgpu 0000:28:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jan 24 16:33:05 [kernel] [    2.790140] amdgpu: smu firmware loading failed
Jan 24 16:33:05 [kernel] [    2.790141] amdgpu 0000:28:00.0: amdgpu: amdgpu_device_ip_init failed
Jan 24 16:33:05 [kernel] [    2.790143] amdgpu 0000:28:00.0: amdgpu: Fatal error during GPU init
Jan 24 16:33:05 [kernel] [    2.790144] amdgpu 0000:28:00.0: amdgpu: amdgpu: finishing device.
Jan 24 16:33:05 [kernel] [    2.793726] [drm] free PSP TMR buffer
Jan 24 16:33:05 [kernel] [    2.825874] amdgpu: probe of 0000:28:00.0 failed with error -95
Jan 24 16:33:05 [kernel] [    2.825951] BUG: unable to handle page fault for address: ffffa4af5100d000
Jan 24 16:33:05 [kernel] [    2.825954] #PF: supervisor write access in kernel mode
Jan 24 16:33:05 [kernel] [    2.825955] #PF: error_code(0x0002) - not-present page
Jan 24 16:33:05 [kernel] [    2.825957] PGD 100000067 P4D 100000067 PUD 100104067 PMD 0
Jan 24 16:33:05 [kernel] [    2.825960] Oops: 0002 [#1] SMP NOPTI
Jan 24 16:33:05 [kernel] [    2.825962] CPU: 6 PID: 759 Comm: systemd-udevd Not tainted 5.15.16-gentoo #8
Jan 24 16:33:05 [kernel] [    2.825965] Hardware name: Micro-Star International Co., Ltd MS-7B86/B450 GAMING PLUS MAX (MS-7B86), BIOS H.60 04/18/2020
Jan 24 16:33:05 [kernel] [    2.825967] RIP: 0010:vcn_v2_0_sw_fini+0x65/0x80 [amdgpu]
Jan 24 16:33:05 [kernel] [    2.826139] Code: 89 ef e8 fe 1b ff ff 85 c0 75 08 48 89 ef e8 42 1a ff ff 48 8b 54 24 08 65 48 2b 14 25 28 00 00 00 75 18 48 83 c4 10 5b 5d c3 <c7> 03 00 00 00 00 8b 7c 24 04 e8 4c c4 4d e9 eb bc e8 15 cd ab e9
Jan 24 16:33:05 [kernel] [    2.826142] RSP: 0018:ffffa4af40bc7c30 EFLAGS: 00010202

In the middle of these 30 lines, you can see the AMDGPU module failing to load smu firmware and dying ungracefully.
At this point my screen is frozen, and the only way to try booting again is to hold down my power button.

I know that amdgpu.dpm=0 is the cause, and removing that launch option would fix it, but if I remove that option my screen will freeze anyway in xorg or whatever game or anything like that

It's a ticking timebomb of unsolvable catch 22s
_________________
-whiteguy
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6214
Location: Dallas area

PostPosted: Mon Jan 24, 2022 11:40 pm    Post subject: Reply with quote

Code:
dpm (int)

Override for dynamic power management setting (0 = disable, 1 = enable) The default is -1 (auto).

fw_load_type (int)

Set different firmware loading type for debugging (0 = direct, 1 = SMU, 2 = PSP). The default is -1 (auto).

aspm (int)

To disable ASPM (1 = enable, 0 = disable). The default is -1 (auto, enabled).


I would play with these options, dpm=0 AND aspm=0, just to see if it affects anything.

I'd also play with the fw_load_type setting to see what happens, it seems to have problems with option 2 (PSP)


https://www.kernel.org/doc/html/v4.20/gpu/amdgpu.html -- amdgpu options, some others you might try if the above doesn't fix things.
_________________
UM780, 6.12 zen kernel, gcc 13, openrc, wayland
Back to top
View user's profile Send private message
prettyflyfora-
n00b
n00b


Joined: 09 Jul 2021
Posts: 23

PostPosted: Tue Jan 25, 2022 12:09 am    Post subject: Reply with quote

I cycled though the combos of those 3 and no luck.
Something however, worked earlier today.

I somehow got everything to click and it booted fine, then something changed and it stopped again

I don't know what changed, because even when I boot into the backup of the working config I had, it doesn't work anymore
Something changed and I have no idea what.

What could have possibly changed? I didn't even install packages or anything.
I'm going to re-emerge linux-firmware after manually deleting the old files.


EDIT: Didn't work
It's the sound drivers all over again.
_________________
-whiteguy
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum