View previous topic :: View next topic |
Author |
Message |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
Posted: Tue Feb 25, 2025 8:31 am Post subject: amdgpu Ryzen 9950X iGPU crashes wiht REG_WAIT timeout |
|
|
I have actual kernel 6.13.4
Asrock X870E Taichi Lite.
Using iGPU.
It has HDMI and 2 USB-C outpust
I use 3 different displays.
When they all connected (HDMI, USB-C => DVI, USB-C => DP) Display manager often crashes (slim/sddm reloads)
When I dosconnect DP monitor crashes are less frequent.
Code: | [ 2.049207] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[ 2.049218] ------------[ cut here ]------------
[ 2.049219] WARNING: CPU: 22 PID: 227 at drivers/gpu/drm/amd/amdgpu/../display/dc/hubbub/dcn31/dcn31_hubbub.c:151 dcn31_program_compbuf_size+0x205/0x210
[ 2.049223] Modules linked in:
[ 2.049224] CPU: 22 UID: 0 PID: 227 Comm: kworker/22:0H Not tainted 6.13.4-gentoo #1
[ 2.049226] Hardware name: ASRock X870E Taichi Lite/X870E Taichi Lite, BIOS 3.18.AS02 02/05/2025
[ 2.049227] Workqueue: events_highpri dm_irq_work_func
[ 2.049229] RIP: 0010:dcn31_program_compbuf_size+0x205/0x210
[ 2.049231] Code: 00 85 c0 74 25 83 7c 24 04 00 75 1e 65 48 8b 04 25 28 00 00 00 48 3b 44 24 08 75 12 48 83 c4 10 5b 5d c3 0f 0b e9 77 ff ff ff <0f> 0b eb de e8 02 39 5f 00 cc cc 0f 1f 44 00 00 41 56 53 48 89 fb
[ 2.049232] RSP: 0018:ffffa8f3c09d7698 EFLAGS: 00010202
[ 2.049233] RAX: 0000000080040a0d RBX: ffff9d3c88663c00 RCX: 0000000000000001
[ 2.049234] RDX: 0000000000000000 RSI: ffff9d3c81cdff20 RDI: ffff9d3c89700000
[ 2.049234] RBP: 000000000000000d R08: ffffa8f3c09d769c R09: 000000000000000d
[ 2.049235] R10: 0000003000000030 R11: ffffa8f3c09d7698 R12: ffff9d3ca0a002a8
[ 2.049235] R13: ffff9d3ca0a050c8 R14: ffff9d3ca0400000 R15: ffff9d3c88663c00
[ 2.049236] FS: 0000000000000000(0000) GS:ffff9d52ff980000(0000) knlGS:0000000000000000
[ 2.049237] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2.049237] CR2: 0000000000000000 CR3: 000000003b41a000 CR4: 0000000000750ef0
[ 2.049238] PKRU: 55555554
[ 2.049238] Call Trace:
[ 2.049240] <TASK>
[ 2.049242] ? __warn+0xda/0x1d0
[ 2.049244] ? dcn31_program_compbuf_size+0x205/0x210
[ 2.049245] ? report_bug+0x141/0x1e0
[ 2.049246] ? handle_bug+0x5e/0x90
[ 2.049248] ? exc_invalid_op+0x16/0x40
[ 2.049249] ? asm_exc_invalid_op+0x16/0x20
[ 2.049250] ? dcn31_program_compbuf_size+0x205/0x210
[ 2.049251] ? dcn31_program_compbuf_size+0x1dc/0x210
[ 2.049252] dcn20_optimize_bandwidth+0xff/0x1f0
[ 2.049254] dc_commit_state_no_check+0x1691/0x1a40
[ 2.049256] dc_commit_streams+0x465/0x610
[ 2.049257] amdgpu_dm_atomic_commit_tail+0x6c1/0x3d10
[ 2.049259] ? dm_read_reg_func+0x59/0xc0
[ 2.049260] ? optc1_get_crtc_scanoutpos+0xca/0x100
[ 2.049262] ? dc_stream_get_scanoutpos+0xf6/0x110
[ 2.049263] ? ktime_get+0x4d/0xd0
[ 2.049264] ? amdgpu_display_get_crtc_scanoutpos+0x88/0x160
[ 2.049266] ? amdgpu_display_crtc_idx_to_irq_type+0x20/0x20
[ 2.049267] ? amdgpu_crtc_get_scanout_position+0x29/0x40
[ 2.049268] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0xe3/0x470
[ 2.049270] ? wait_for_common+0x198/0x1d0
[ 2.049271] ? drm_crtc_commit_wait+0x32/0x90
[ 2.049272] commit_tail+0xbe/0x2c0
[ 2.049274] drm_atomic_helper_commit+0x24f/0x260
[ 2.049275] drm_atomic_commit+0xb8/0xe0
[ 2.049276] ? __drm_printfn_seq_file+0x20/0x20
[ 2.049277] drm_client_modeset_commit_atomic+0x178/0x200
[ 2.049279] drm_client_modeset_commit_locked+0x45/0x160
[ 2.049280] drm_client_modeset_commit+0x23/0x50
[ 2.049281] drm_fb_helper_hotplug_event+0x13b/0x2b0
[ 2.049283] drm_client_dev_hotplug+0x8b/0x110
[ 2.049284] handle_hpd_irq_helper+0x157/0x190
[ 2.049285] process_scheduled_works+0x1f8/0x440
[ 2.049287] worker_thread+0x24a/0x2f0
[ 2.049288] ? pr_cont_work+0x1c0/0x1c0
[ 2.049289] kthread+0x147/0x160
[ 2.049290] ? kthread_blkcg+0x30/0x30
[ 2.049291] ret_from_fork+0x30/0x40
[ 2.049292] ? kthread_blkcg+0x30/0x30
[ 2.049293] ret_from_fork_asm+0x11/0x20
[ 2.049294] </TASK>
[ 2.049295] ---[ end trace 0000000000000000 ]---
...
[ 5511.776162] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:142
[ 6158.266516] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[ 6158.991735] usb 3-9: new high-speed USB device number 7 using xhci_hcd
[ 6159.199580] usb 3-9: New USB device found, idVendor=05e3, idProduct=0610, bcdDevice=32.98
[ 6159.199584] usb 3-9: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[ 6159.199585] usb 3-9: Product: USB2.0 Hub
[ 6159.210275] hub 3-9:1.0: USB hub found
[ 6159.213575] hub 3-9:1.0: 4 ports detected
[ 6541.325827] xhci_hcd 0000:13:00.0: WARN: buffer overrun event for slot 3 ep 6 on endpoint
[ 6541.515213] xhci_hcd 0000:13:00.0: WARN: buffer overrun event for slot 3 ep 6 on endpoint
[ 7277.741895] usb 3-9: USB disconnect, device number 7
[ 7872.061253] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:142
[75846.506207] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[75847.252847] usb 3-9: new high-speed USB device number 9 using xhci_hcd
[75847.462715] usb 3-9: New USB device found, idVendor=05e3, idProduct=0610, bcdDevice=32.98
[75847.462724] usb 3-9: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[75847.462726] usb 3-9: Product: USB2.0 Hub
[75847.473873] hub 3-9:1.0: USB hub found
[75847.477610] hub 3-9:1.0: 4 ports detected
[78649.144667] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[80103.480423] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[ 5511.776162] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:142
[ 6158.266516] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[ 6158.991735] usb 3-9: new high-speed USB device number 7 using xhci_hcd
[ 6159.199580] usb 3-9: New USB device found, idVendor=05e3, idProduct=0610, bcdDevice=32.98
[ 6159.199584] usb 3-9: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[ 6159.199585] usb 3-9: Product: USB2.0 Hub
[ 6159.210275] hub 3-9:1.0: USB hub found
[ 6159.213575] hub 3-9:1.0: 4 ports detected
[ 6541.325827] xhci_hcd 0000:13:00.0: WARN: buffer overrun event for slot 3 ep 6 on endpoint
[ 6541.515213] xhci_hcd 0000:13:00.0: WARN: buffer overrun event for slot 3 ep 6 on endpoint
[ 7277.741895] usb 3-9: USB disconnect, device number 7
[ 7872.061253] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:142
[75846.506207] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[75847.252847] usb 3-9: new high-speed USB device number 9 using xhci_hcd
[75847.462715] usb 3-9: New USB device found, idVendor=05e3, idProduct=0610, bcdDevice=32.98
[75847.462724] usb 3-9: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[75847.462726] usb 3-9: Product: USB2.0 Hub
[75847.473873] hub 3-9:1.0: USB hub found
[75847.477610] hub 3-9:1.0: 4 ports detected
[78649.144667] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141
[80103.480423] amdgpu 0000:79:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:141 |
Hmm. I see usb hub in display reloads. I'll try disconnect this display, may be the problem is this display... |
|
Back to top |
|
 |
MickeyM n00b

Joined: 27 Feb 2025 Posts: 1
|
Posted: Thu Feb 27, 2025 2:14 pm Post subject: |
|
|
For a bleeding-edge kernel version, you should also accept ~amd64 at linux-firmware. Did you do that?
My system uses a Zen4 Ryzen and is running 2 DP Monitors on iGPU via DP daisychain. Since kernel release >=6.10 its running perfectly stable. Before, it was horrible... amdgpu crashes and spontaneous system reboots.
Maybe future kernel and firmware versions may help in your case.
What i can also advise you: stay away from power cords with your DP cables or display cables in general. They are emitting EM interference that caused my monitors to regularly crash and loosing connection. Some monitors are likely more sensitive than others but it's always a good decision to separate data cables away from power cables. Especially cable management where they are in direct proximity to each other caused this problem for me. Maybe this helps you, too. |
|
Back to top |
|
 |
logrusx Advocate


Joined: 22 Feb 2018 Posts: 2947
|
Posted: Thu Feb 27, 2025 4:00 pm Post subject: |
|
|
AMDGPU is very volatile, I didn't have reliable wake up from sleep since 6.1.57 or something like that. I was stuck with 6.1.91 which at least offered almost reliable wake up from sleep when plugged in. Only recently someone pointed out on the forums they had this issue solved in 6.13. They solved one, but maybe created others...
Best Regards,
Georgi |
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
Posted: Thu Mar 06, 2025 1:55 pm Post subject: |
|
|
MickeyM wrote: | For a bleeding-edge kernel version, you should also accept ~amd64 at linux-firmware. Did you do that? |
Thanks for reply, I use latest really used firmware files built in kernel updated by script:
Code: | ...
[ 0.500883] Loading firmware: amdgpu/psp_13_0_5_toc.bin
[ 0.500890] Loading firmware: amdgpu/psp_13_0_5_ta.bin
[ 0.500980] Loading firmware: amdgpu/dcn_3_1_5_dmcub.bin
[ 0.501073] Loading firmware: amdgpu/gc_10_3_6_pfp.bin
[ 0.501174] Loading firmware: amdgpu/gc_10_3_6_me.bin
[ 0.501274] Loading firmware: amdgpu/gc_10_3_6_ce.bin
[ 0.501383] Loading firmware: amdgpu/gc_10_3_6_rlc.bin
[ 0.501452] Loading firmware: amdgpu/gc_10_3_6_mec.bin
[ 0.501556] Loading firmware: amdgpu/gc_10_3_6_mec2.bin
[ 0.501659] Loading firmware: amdgpu/sdma_5_2_6.bin
[ 0.501675] Loading firmware: amdgpu/vcn_3_1_2.bin
[ 0.502360] [drm] Loading DMUB firmware via PSP: version=0x05002000
[ 0.502600] [drm] Found VCN firmware Version ENC: 1.31 DEC: 3 VEP: 0 Revision: 3
... |
After resume from suspend I see firmware for network devices are loaded again, not amdgpu ones.
Is it possible amdgpu needs to be as modules to keep updated firmware?
I don't use initrd, just EFI stub file.
After latest BIOS upgrade I see it's more stable even on highest supported RAM freq (6400).
After a while after resume from suspend I get again
Code: | segfault at 50 ip 00007f1a723329fb sp 00007ffce75d48f8 error 4 in libX11.so.6.4.0 |
|
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
Posted: Tue Apr 15, 2025 2:37 pm Post subject: |
|
|
gentoo-sources-6.14.2 seem much better, no DE crash since upgrade. |
|
Back to top |
|
 |
dede11 n00b

Joined: 11 Mar 2025 Posts: 5
|
Posted: Wed Apr 16, 2025 6:31 am Post subject: |
|
|
Can you please post your .config file
I have ryzen 9 7950x3d and i cant get iGPU to work |
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
Posted: Wed Apr 16, 2025 6:48 am Post subject: |
|
|
I don't use initrd, just EFI stub loaded by UEFI BIOS directly, have firmware files built in kernel. |
|
Back to top |
|
 |
logrusx Advocate


Joined: 22 Feb 2018 Posts: 2947
|
Posted: Wed Apr 16, 2025 7:22 am Post subject: |
|
|
I intended to answer you back then but perhaps I was reading it on my phone while riding the bus or something and forgot until I got back home.
DeIM wrote: |
After resume from suspend I see firmware for network devices are loaded again, not amdgpu ones.
Is it possible amdgpu needs to be as modules to keep updated firmware? |
No. You have firmware built into the kernel as well as the driver. There's nothing to be loaded, all is fine.
Although I can't prove it, I'm petty certain you were affected by some of the glitches of the development of AMDGPU driver. It's been subject of frequent change for quite some time and it has affected different users at different points in time. Please try 6.12 and tell me if the crash is there.
Best Regards,
Georgi
Last edited by logrusx on Wed Apr 16, 2025 3:13 pm; edited 1 time in total |
|
Back to top |
|
 |
pietinger Moderator

Joined: 17 Oct 2006 Posts: 5629 Location: Bavaria
|
Posted: Wed Apr 16, 2025 1:13 pm Post subject: |
|
|
DeIM wrote: | I don't use initrd, just EFI stub loaded by UEFI BIOS directly, have firmware files built in kernel. |
I do the same ... and in this case it is important to include also the microcode for the CPU in your kernel ... you have included only the firmware for the AMDGPU:
Code: | CONFIG_EXTRA_FIRMWARE="amdgpu/psp_13_0_5_toc.bin amdgpu/psp_13_0_5_ta.bin amdgpu/dcn_3_1_5_dmcub.bin amdgpu/gc_10_3_6_pfp.bin amdgpu/gc_10_3_6_me.bin amdgpu/gc_10_3_6_ce.bin amdgpu/gc_10_3_6_rlc.bin amdgpu/gc_10_3_6_mec.bin amdgpu/gc_10_3_6_mec2.bin amdgpu/sdma_5_2_6.bin amdgpu/vcn_3_1_2.bin" |
See more: https://wiki.gentoo.org/wiki/AMD_microcode
(Maybe read my article for an Intel CPU before, because it gives you a hint: https://forums.gentoo.org/viewtopic-t-1065464.html )
BTW: You dont use a HID device using i2c? ... because you are missing:
Code: | # CONFIG_SPI_AMD is not set
# CONFIG_PINCTRL_AMD is not set
# CONFIG_I2C_HID_ACPI is not set |
Maybe boot with the newest UbuntuLiveCD and check if one of these modules was loaded (If yes, then you should enable it also):
Code: | # CONFIG_AMD_PTDMA is not set
# CONFIG_AMD_QDMA is not set |
_________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
Posted: Tue Apr 22, 2025 8:19 am Post subject: |
|
|
seem my setup doesn't use any of mentioned modules. |
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
Posted: Tue Apr 22, 2025 8:23 am Post subject: |
|
|
dede11 wrote: | Can you please post your .config file
I have ryzen 9 7950x3d and i cant get iGPU to work |
Do You have any progres in Your setup?
It seem iGPU is sensitive to RAM settings.
Sometimes it can't be boot or have another problems, then is adviced to shut PC down and disconnect power (or PSU switch if yours have any) till PSU shuts completly down - since have some seconds timeout after poweroff. |
|
Back to top |
|
 |
DeIM Guru


Joined: 11 Apr 2006 Posts: 453
|
Posted: Tue Apr 22, 2025 10:02 am Post subject: |
|
|
If somebody interested, I don't use linux firmware ebuild but I use this script for firmware upgrading:
Code: | #!/bin/bash
FILES="amd/amd_sev_fam1ah_model0xh.sbin amdgpu/psp_13_0_5_toc.bin amdgpu/psp_13_0_5_ta.bin amdgpu/dcn_3_1_5_dmcub.bin amdgpu/gc_10_3_6_pfp.bin amdgpu/gc_10_3_6_me.bin amdgpu/gc_10_3_6_ce.bin amdgpu/gc_10_3_6_rlc.bin amdgpu/gc_10_3_6_mec.bin amdgpu/gc_10_3_6_mec2.bin amdgpu/sdma_5_2_6.bin amdgpu/vcn_3_1_2.bin mediatek/mt7925/BT_RAM_CODE_MT7925_1_1_hdr.bin mediatek/mt7925/WIFI_MT7925_PATCH_MCU_1_1_hdr.bin mediatek/mt7925/WIFI_RAM_CODE_MT7925_1_1.bin rtl_nic/rtl8126a-3.fw"
PREFIX="https://web.git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/plain/"
cd /lib/firmware
for FILE in $FILES
do
if [ ! -f $FILE ]; then
echo "$FILE not found, creating, ignore DIFF of this file :-)"
mkdir -p $(dirname "${FILE}")
touch $FILE
fi
x=`xxh128sum $FILE`
wget -q -O $FILE "${PREFIX}${FILE}"
y=`xxh128sum $FILE`
if [ "$x" != "$y" ]; then
echo "DIFF:"
echo "$x"
echo "$y"
fi
done |
|
|
Back to top |
|
 |
|