Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
amdgpu random errors
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
Dwosky
Tux's lil' helper
Tux's lil' helper


Joined: 07 Nov 2018
Posts: 138

PostPosted: Fri Nov 01, 2024 12:01 pm    Post subject: amdgpu random errors Reply with quote

I've been having random errors with my Radeon RX 7800 XT, in which system usually hangs about 10-20 seconds after I login. The monitor turns to energy saving mode, as if there is no video signal coming in and the PC gets stuck, since I don't seem to be able to switch to another console nor restart it with the keyboard. I usually have to push the restart button on the PC case itself. This issue doesn't happen all the time, sometimes I can work without problems and usually when it happens, the next reset seems to work. So basically, at least from what I've seen, its a random thing that happens during the first minute of working with the PC, independently if I open or not other applications.

Upon checking the kernel logs after the restart I was able to find two different logs refering to the GPU:
Code:
Nov  1 11:52:17 ProjectX kernel: [  122.229049] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xfffffbecc0 flags=0x0030]
Nov  1 11:52:28 ProjectX kernel: [  132.649294] amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out

Full log of first error: https://paste.gentoo.zip/oNI4U3lj
Code:
Oct 30 21:34:26 ProjectX kernel: [  121.316618] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=17091, emitted seq=17093
Oct 30 21:34:26 ProjectX kernel: [  121.316700] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_wayland pid 3086 thread kwin_wayla:cs0 pid 3135
Oct 30 21:34:26 ProjectX kernel: [  121.316762] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 30 21:34:30 ProjectX kernel: [  125.316753] amdgpu 0000:03:00.0: amdgpu: failed to suspend display audio
Oct 30 21:34:34 ProjectX kernel: [  128.595628] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Oct 30 21:34:34 ProjectX kernel: [  128.595631] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Oct 30 21:34:37 ProjectX kernel: [  131.874243] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Oct 30 21:34:37 ProjectX kernel: [  131.874246] amdgpu 0000:03:00.0: amdgpu: [SetDfCstate] failed!
Oct 30 21:34:37 ProjectX kernel: [  131.874247] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate

Full log of second error: https://paste.gentoo.zip/d1n97c5s

In both cases, the next line was the first line after the reboot and the previous lines where several seconds apart, so I don't think they have anything to do with the issue itself.

I'm running KDE over Wayland, I've seen some older posts regarding a kernel issue for Wayland, but they were kinda old (+1 year or so), so I don't know if that issue is still present in the current kernel or not. Any idea on what to check or review to see if I can make the system more stable? I'm not 100% sure if its something kernel related or config related.

kernel config: https://paste.gentoo.zip/iH7Q2Rhz

Code:
# emerge --info
Portage 3.0.66.1 (python 3.12.7-final-0, default/linux/amd64/23.0/split-usr/desktop/plasma, gcc-13, glibc-2.39-r6, 6.6.58-gentoo-r1 x86_64)
=================================================================
System uname: Linux-6.6.58-gentoo-r1-x86_64-AMD_Ryzen_9_7900_12-Core_Processor-with-glibc2.39
KiB Mem:    64944488 total,  47138032 free
KiB Swap:    4194300 total,   4194300 free
Timestamp of repository gentoo: Thu, 31 Oct 2024 11:30:00 +0000
Head commit of repository gentoo: f3f6340be078db42c62ef2768143abae2b23f924
Timestamp of repository brother-overlay: Tue, 08 Oct 2024 15:51:03 +0000
Head commit of repository brother-overlay: 928bbe8f324720cbb3dd74c3db524c0e674f1349

Timestamp of repository dwosky: Wed, 30 Oct 2024 11:52:59 +0000
Head commit of repository dwosky: f4eeef4cdc61a2e0b5daa51b415ae22d8584b23d

Timestamp of repository guru: Wed, 30 Oct 2024 17:18:37 +0000
Head commit of repository guru: 16f2fe1f041317460a0faffef798c1f73574fda4

Timestamp of repository librewolf: Mon, 21 Oct 2024 13:48:20 +0000
Head commit of repository librewolf: 9ad61e0a4b8e6aa22a8913ffed7cb5981ece00f6

Timestamp of repository steam-overlay: Tue, 08 Oct 2024 15:50:59 +0000
Head commit of repository steam-overlay: c802c22bb423cb84d975b3fc9cfe6bc9410d22cd

sh bash 5.2_p37
ld GNU ld (Gentoo 2.42 p6) 2.42.0
app-misc/pax-utils:        1.3.7::gentoo
app-shells/bash:           5.2_p37::gentoo
dev-build/autoconf:        2.13-r8::gentoo, 2.72-r1::gentoo
dev-build/automake:        1.16.5-r2::gentoo
dev-build/cmake:           3.30.5::gentoo
dev-build/libtool:         2.4.7-r4::gentoo
dev-build/make:            4.4.1-r1::gentoo
dev-build/meson:           1.5.2::gentoo
dev-java/java-config:      2.3.4::gentoo
dev-lang/perl:             5.40.0::gentoo
dev-lang/python:           3.12.7_p1::gentoo, 3.13.0::gentoo
dev-lang/rust:             1.81.0::gentoo
sys-apps/baselayout:       2.15::gentoo
sys-apps/openrc:           0.54.2::gentoo
sys-apps/sandbox:          2.39::gentoo
sys-devel/binutils:        2.42-r2::gentoo
sys-devel/binutils-config: 5.5.2::gentoo
sys-devel/clang:           18.1.8::gentoo
sys-devel/gcc:             13.3.1_p20240614::gentoo
sys-devel/gcc-config:      2.11::gentoo
sys-devel/lld:             18.1.8::gentoo
sys-devel/llvm:            18.1.8-r1::gentoo
sys-kernel/linux-headers:  6.6-r1::gentoo (virtual/os-headers)
sys-libs/glibc:            2.39-r6::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    volatile: False
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-max-age: 24
    sync-rsync-extra-opts:
    sync-rsync-verify-jobs: 1

brother-overlay
    location: /var/db/repos/brother-overlay
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/brother-overlay.git
    masters: gentoo
    volatile: False

dwosky
    location: /var/db/repos/dwosky
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/dwosky.git
    masters: gentoo
    volatile: False

guru
    location: /var/db/repos/guru
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/guru.git
    masters: gentoo
    volatile: False

librewolf
    location: /var/db/repos/librewolf
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/librewolf.git
    masters: gentoo
    volatile: False

steam-overlay
    location: /var/db/repos/steam-overlay
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/steam-overlay.git
    masters: gentoo
    volatile: False

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE brother-eula ValveSteamLicense"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-march=native -O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O2 -pipe"
GENTOO_MIRRORS="http://gentoo.mirror.root.lu/     http://tux.rainside.sk/gentoo/     ftp://tux.rainside.sk/gentoo/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs"
LEX="flex"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="X a52 aac acl acpi activities alsa amd64 branding bzip2 cairo cdda cdr cet crypt cups dbus declarative dri dts dvd dvdr elogind encode exif flac gdbm gif gpm gtk gui iconv icu jpeg kde kf6compat kwallet lcms libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses networkmanager nls ogg opengl openmp pam pango pcre pdf pipewire plasma png policykit ppds pulseaudio qml qt5 qt6 readline screencast sdl seccomp semantic-desktop sound spell split-usr ssl startup-notification svg test-rust threads tiff truetype udev udisks unicode upower usb vorbis vulkan wayland widgets wxwidgets x264 xattr xcb xft xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gcc_12" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512vbmi f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" GRUB_PLATFORMS="efi-64" GUILE_SINGLE_TARGET="3-0" GUILE_TARGETS="3-0" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en es" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-2" POSTGRES_TARGETS="postgres16" PYTHON_SINGLE_TARGET="python3_12" PYTHON_TARGETS="python3_12" RUBY_TARGETS="ruby32" VIDEO_CARDS="amdgpu radeonsi" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, MAKEOPTS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PYTHONPATH, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS


Last edited by Dwosky on Fri Nov 01, 2024 2:53 pm; edited 2 times in total
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5071
Location: Bavaria

PostPosted: Fri Nov 01, 2024 1:47 pm    Post subject: Re: amdgpu random errors Reply with quote

Dwosky wrote:
[...] Any idea on what to check or review to see if I can make the system more stable? I'm not 100% sure if its something kernel related or config related.

I would take a look at your kernel .config. It would be best to send me the complete dmesg as well. (Please use wgetpaste for both).
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Dwosky
Tux's lil' helper
Tux's lil' helper


Joined: 07 Nov 2018
Posts: 138

PostPosted: Fri Nov 01, 2024 2:53 pm    Post subject: Reply with quote

I've added the full kernel log of both cases and the kernel configuration in the first post.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5071
Location: Bavaria

PostPosted: Fri Nov 01, 2024 4:19 pm    Post subject: Reply with quote

Maybe this is the reason:
Code:
# CONFIG_IRQ_REMAP is not set

If this does not help, then I would like to ask, what happens if you disable in the BIOS the GPU of your AMD CPU?

BTW: I would change these too (but it should not be the reason):
Code:
# CONFIG_TRANSPARENT_HUGEPAGE is not set
# CONFIG_LRU_GEN is not set
# CONFIG_SPI is not set
# CONFIG_PINCTRL_AMD is not set

Maybe change this to "powersave" if you dont use an user application which does tehe job ("powersafe" is said to be best with AMD P-State):
Code:
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y


Internal note:
Code:
Oct 30 21:32:31 ProjectX kernel: [    0.215038] smpboot: CPU0: AMD Ryzen 9 7900 12-Core Processor (family: 0x19, model: 0x61, stepping: 0x2)
Oct 30 21:32:31 ProjectX kernel: [    0.486700] ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B0F (\GSA1.SMBI) (20230628/utaddress-204)
Oct 30 21:32:31 ProjectX kernel: [    0.486708] ACPI: OSL: Resource conflict; ACPI support missing from driver?
Oct 30 21:32:31 ProjectX kernel: [    0.486719] fail to initialize ptp_kvm

Oct 30 21:32:31 ProjectX kernel: [    4.874726] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)

Oct 30 21:32:31 ProjectX kernel: [    5.590225] kfd kfd: amdgpu: added device 1002:747e

Oct 30 21:32:31 ProjectX kernel: [    5.815003] amdgpu 0000:11:00.0: enabling device (0006 -> 0007)
Oct 30 21:32:31 ProjectX kernel: [    5.816401] amdgpu: ATOM BIOS: 102-RAPHAEL-008

_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
keekkenen
n00b
n00b


Joined: 05 Oct 2024
Posts: 3

PostPosted: Fri Nov 01, 2024 11:04 pm    Post subject: Reply with quote

I disabled AMD GPU in BIOS and my kernel options above is
Code:

CONFIG_IRQ_REMAP=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_LRU_GEN=y
CONFIG_SPI=y
CONFIG_PINCTRL_AMD=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set


I didn't have any trouble, today I tried wine-proton, steam, csgo2, It works well and looks like under windows
_________________
7950x3d / x670e MSI Tomagawk / Sapphire RX 7800XT 16Gb / G.Skill 64Gb 5600 / A-Data Legend 960 2Tb (x2), A-Data SX8200PNP 256Gb
Back to top
View user's profile Send private message
Dwosky
Tux's lil' helper
Tux's lil' helper


Joined: 07 Nov 2018
Posts: 138

PostPosted: Tue Nov 05, 2024 3:46 pm    Post subject: Reply with quote

I've updated the suggested kernel parameters and for the time being it seems the system its more stable. Lets hope it keeps that way, thanks. :D
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5071
Location: Bavaria

PostPosted: Tue Nov 05, 2024 6:34 pm    Post subject: Reply with quote

Dwosky wrote:
I've updated the suggested kernel parameters and for the time being it seems the system its more stable. [...]

Happy to hear that. :D

Dwosky wrote:
[...] Lets hope it keeps that way, thanks. :D

Yes ... and ... you are very Welcome! :D
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum