View previous topic :: View next topic |
Author |
Message |
Dwosky Tux's lil' helper
Joined: 07 Nov 2018 Posts: 139
|
Posted: Fri Nov 01, 2024 12:01 pm Post subject: amdgpu random errors |
|
|
I've been having random errors with my Radeon RX 7800 XT, in which system usually hangs about 10-20 seconds after I login. The monitor turns to energy saving mode, as if there is no video signal coming in and the PC gets stuck, since I don't seem to be able to switch to another console nor restart it with the keyboard. I usually have to push the restart button on the PC case itself. This issue doesn't happen all the time, sometimes I can work without problems and usually when it happens, the next reset seems to work. So basically, at least from what I've seen, its a random thing that happens during the first minute of working with the PC, independently if I open or not other applications.
Upon checking the kernel logs after the restart I was able to find two different logs refering to the GPU:
Code: | Nov 1 11:52:17 ProjectX kernel: [ 122.229049] amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0010 address=0xfffffbecc0 flags=0x0030]
Nov 1 11:52:28 ProjectX kernel: [ 132.649294] amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out |
Full log of first error: https://paste.gentoo.zip/oNI4U3lj
Code: | Oct 30 21:34:26 ProjectX kernel: [ 121.316618] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=17091, emitted seq=17093
Oct 30 21:34:26 ProjectX kernel: [ 121.316700] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_wayland pid 3086 thread kwin_wayla:cs0 pid 3135
Oct 30 21:34:26 ProjectX kernel: [ 121.316762] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 30 21:34:30 ProjectX kernel: [ 125.316753] amdgpu 0000:03:00.0: amdgpu: failed to suspend display audio
Oct 30 21:34:34 ProjectX kernel: [ 128.595628] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Oct 30 21:34:34 ProjectX kernel: [ 128.595631] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Oct 30 21:34:37 ProjectX kernel: [ 131.874243] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Oct 30 21:34:37 ProjectX kernel: [ 131.874246] amdgpu 0000:03:00.0: amdgpu: [SetDfCstate] failed!
Oct 30 21:34:37 ProjectX kernel: [ 131.874247] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate |
Full log of second error: https://paste.gentoo.zip/d1n97c5s
In both cases, the next line was the first line after the reboot and the previous lines where several seconds apart, so I don't think they have anything to do with the issue itself.
I'm running KDE over Wayland, I've seen some older posts regarding a kernel issue for Wayland, but they were kinda old (+1 year or so), so I don't know if that issue is still present in the current kernel or not. Any idea on what to check or review to see if I can make the system more stable? I'm not 100% sure if its something kernel related or config related.
kernel config: https://paste.gentoo.zip/iH7Q2Rhz
Code: | # emerge --info
Portage 3.0.66.1 (python 3.12.7-final-0, default/linux/amd64/23.0/split-usr/desktop/plasma, gcc-13, glibc-2.39-r6, 6.6.58-gentoo-r1 x86_64)
=================================================================
System uname: Linux-6.6.58-gentoo-r1-x86_64-AMD_Ryzen_9_7900_12-Core_Processor-with-glibc2.39
KiB Mem: 64944488 total, 47138032 free
KiB Swap: 4194300 total, 4194300 free
Timestamp of repository gentoo: Thu, 31 Oct 2024 11:30:00 +0000
Head commit of repository gentoo: f3f6340be078db42c62ef2768143abae2b23f924
Timestamp of repository brother-overlay: Tue, 08 Oct 2024 15:51:03 +0000
Head commit of repository brother-overlay: 928bbe8f324720cbb3dd74c3db524c0e674f1349
Timestamp of repository dwosky: Wed, 30 Oct 2024 11:52:59 +0000
Head commit of repository dwosky: f4eeef4cdc61a2e0b5daa51b415ae22d8584b23d
Timestamp of repository guru: Wed, 30 Oct 2024 17:18:37 +0000
Head commit of repository guru: 16f2fe1f041317460a0faffef798c1f73574fda4
Timestamp of repository librewolf: Mon, 21 Oct 2024 13:48:20 +0000
Head commit of repository librewolf: 9ad61e0a4b8e6aa22a8913ffed7cb5981ece00f6
Timestamp of repository steam-overlay: Tue, 08 Oct 2024 15:50:59 +0000
Head commit of repository steam-overlay: c802c22bb423cb84d975b3fc9cfe6bc9410d22cd
sh bash 5.2_p37
ld GNU ld (Gentoo 2.42 p6) 2.42.0
app-misc/pax-utils: 1.3.7::gentoo
app-shells/bash: 5.2_p37::gentoo
dev-build/autoconf: 2.13-r8::gentoo, 2.72-r1::gentoo
dev-build/automake: 1.16.5-r2::gentoo
dev-build/cmake: 3.30.5::gentoo
dev-build/libtool: 2.4.7-r4::gentoo
dev-build/make: 4.4.1-r1::gentoo
dev-build/meson: 1.5.2::gentoo
dev-java/java-config: 2.3.4::gentoo
dev-lang/perl: 5.40.0::gentoo
dev-lang/python: 3.12.7_p1::gentoo, 3.13.0::gentoo
dev-lang/rust: 1.81.0::gentoo
sys-apps/baselayout: 2.15::gentoo
sys-apps/openrc: 0.54.2::gentoo
sys-apps/sandbox: 2.39::gentoo
sys-devel/binutils: 2.42-r2::gentoo
sys-devel/binutils-config: 5.5.2::gentoo
sys-devel/clang: 18.1.8::gentoo
sys-devel/gcc: 13.3.1_p20240614::gentoo
sys-devel/gcc-config: 2.11::gentoo
sys-devel/lld: 18.1.8::gentoo
sys-devel/llvm: 18.1.8-r1::gentoo
sys-kernel/linux-headers: 6.6-r1::gentoo (virtual/os-headers)
sys-libs/glibc: 2.39-r6::gentoo
Repositories:
gentoo
location: /var/db/repos/gentoo
sync-type: rsync
sync-uri: rsync://rsync.gentoo.org/gentoo-portage
priority: -1000
volatile: False
sync-rsync-verify-metamanifest: yes
sync-rsync-verify-max-age: 24
sync-rsync-extra-opts:
sync-rsync-verify-jobs: 1
brother-overlay
location: /var/db/repos/brother-overlay
sync-type: git
sync-uri: https://github.com/gentoo-mirror/brother-overlay.git
masters: gentoo
volatile: False
dwosky
location: /var/db/repos/dwosky
sync-type: git
sync-uri: https://github.com/gentoo-mirror/dwosky.git
masters: gentoo
volatile: False
guru
location: /var/db/repos/guru
sync-type: git
sync-uri: https://github.com/gentoo-mirror/guru.git
masters: gentoo
volatile: False
librewolf
location: /var/db/repos/librewolf
sync-type: git
sync-uri: https://github.com/gentoo-mirror/librewolf.git
masters: gentoo
volatile: False
steam-overlay
location: /var/db/repos/steam-overlay
sync-type: git
sync-uri: https://github.com/gentoo-mirror/steam-overlay.git
masters: gentoo
volatile: False
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE brother-eula ValveSteamLicense"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt /var/bind"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-march=native -O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O2 -pipe"
GENTOO_MIRRORS="http://gentoo.mirror.root.lu/ http://tux.rainside.sk/gentoo/ ftp://tux.rainside.sk/gentoo/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,-z,pack-relative-relocs"
LEX="flex"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="X a52 aac acl acpi activities alsa amd64 branding bzip2 cairo cdda cdr cet crypt cups dbus declarative dri dts dvd dvdr elogind encode exif flac gdbm gif gpm gtk gui iconv icu jpeg kde kf6compat kwallet lcms libnotify libtirpc mad mng mp3 mp4 mpeg multilib ncurses networkmanager nls ogg opengl openmp pam pango pcre pdf pipewire plasma png policykit ppds pulseaudio qml qt5 qt6 readline screencast sdl seccomp semantic-desktop sound spell split-usr ssl startup-notification svg test-rust threads tiff truetype udev udisks unicode upower usb vorbis vulkan wayland widgets wxwidgets x264 xattr xcb xft xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gcc_12" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="aes avx avx2 avx512f avx512dq avx512cd avx512bw avx512vl avx512vbmi f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" GRUB_PLATFORMS="efi-64" GUILE_SINGLE_TARGET="3-0" GUILE_TARGETS="3-0" INPUT_DEVICES="libinput" KERNEL="linux" L10N="en es" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-2" POSTGRES_TARGETS="postgres16" PYTHON_SINGLE_TARGET="python3_12" PYTHON_TARGETS="python3_12" RUBY_TARGETS="ruby32" VIDEO_CARDS="amdgpu radeonsi" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset: ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, MAKEOPTS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PYTHONPATH, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS |
Last edited by Dwosky on Fri Nov 01, 2024 2:53 pm; edited 2 times in total |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5174 Location: Bavaria
|
Posted: Fri Nov 01, 2024 1:47 pm Post subject: Re: amdgpu random errors |
|
|
Dwosky wrote: | [...] Any idea on what to check or review to see if I can make the system more stable? I'm not 100% sure if its something kernel related or config related. |
I would take a look at your kernel .config. It would be best to send me the complete dmesg as well. (Please use wgetpaste for both). _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
Dwosky Tux's lil' helper
Joined: 07 Nov 2018 Posts: 139
|
Posted: Fri Nov 01, 2024 2:53 pm Post subject: |
|
|
I've added the full kernel log of both cases and the kernel configuration in the first post. |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5174 Location: Bavaria
|
Posted: Fri Nov 01, 2024 4:19 pm Post subject: |
|
|
Maybe this is the reason:
Code: | # CONFIG_IRQ_REMAP is not set |
If this does not help, then I would like to ask, what happens if you disable in the BIOS the GPU of your AMD CPU?
BTW: I would change these too (but it should not be the reason):
Code: | # CONFIG_TRANSPARENT_HUGEPAGE is not set
# CONFIG_LRU_GEN is not set
# CONFIG_SPI is not set
# CONFIG_PINCTRL_AMD is not set |
Maybe change this to "powersave" if you dont use an user application which does tehe job ("powersafe" is said to be best with AMD P-State):
Code: | CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y |
Internal note:
Code: | Oct 30 21:32:31 ProjectX kernel: [ 0.215038] smpboot: CPU0: AMD Ryzen 9 7900 12-Core Processor (family: 0x19, model: 0x61, stepping: 0x2)
Oct 30 21:32:31 ProjectX kernel: [ 0.486700] ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B0F (\GSA1.SMBI) (20230628/utaddress-204)
Oct 30 21:32:31 ProjectX kernel: [ 0.486708] ACPI: OSL: Resource conflict; ACPI support missing from driver?
Oct 30 21:32:31 ProjectX kernel: [ 0.486719] fail to initialize ptp_kvm
Oct 30 21:32:31 ProjectX kernel: [ 4.874726] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
Oct 30 21:32:31 ProjectX kernel: [ 5.590225] kfd kfd: amdgpu: added device 1002:747e
Oct 30 21:32:31 ProjectX kernel: [ 5.815003] amdgpu 0000:11:00.0: enabling device (0006 -> 0007)
Oct 30 21:32:31 ProjectX kernel: [ 5.816401] amdgpu: ATOM BIOS: 102-RAPHAEL-008 |
_________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
keekkenen n00b
Joined: 05 Oct 2024 Posts: 11
|
Posted: Fri Nov 01, 2024 11:04 pm Post subject: |
|
|
I disabled AMD GPU in BIOS and my kernel options above is
Code: |
CONFIG_IRQ_REMAP=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_LRU_GEN=y
CONFIG_SPI=y
CONFIG_PINCTRL_AMD=y
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set
|
I didn't have any trouble, today I tried wine-proton, steam, csgo2, It works well and looks like under windows _________________ 7950x3d / x670e MSI Tomagawk / Sapphire RX 7800XT 16Gb / G.Skill 64Gb 5600 / A-Data Legend 960 2Tb (x2), A-Data SX8200PNP 256Gb |
|
Back to top |
|
|
Dwosky Tux's lil' helper
Joined: 07 Nov 2018 Posts: 139
|
Posted: Tue Nov 05, 2024 3:46 pm Post subject: |
|
|
I've updated the suggested kernel parameters and for the time being it seems the system its more stable. Lets hope it keeps that way, thanks. |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5174 Location: Bavaria
|
Posted: Tue Nov 05, 2024 6:34 pm Post subject: |
|
|
Dwosky wrote: | I've updated the suggested kernel parameters and for the time being it seems the system its more stable. [...] |
Happy to hear that.
Dwosky wrote: | [...] Lets hope it keeps that way, thanks. |
Yes ... and ... you are very Welcome! _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
Dwosky Tux's lil' helper
Joined: 07 Nov 2018 Posts: 139
|
Posted: Wed Nov 13, 2024 5:47 pm Post subject: |
|
|
It seems I'm still facing issues, as it happened again today with the following error in the kernel log:
Code: | Nov 13 18:39:47 ProjectX kernel: [ 85.054367] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=15098, emitted seq=15100
Nov 13 18:39:47 ProjectX kernel: [ 85.054448] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kwin_wayland pid 3016 thread kwin_wayla:cs0 pid 3065
Nov 13 18:39:47 ProjectX kernel: [ 85.054510] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Nov 13 18:39:51 ProjectX kernel: [ 89.055478] amdgpu 0000:03:00.0: amdgpu: failed to suspend display audio
Nov 13 18:39:54 ProjectX kernel: [ 92.327087] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Nov 13 18:39:54 ProjectX kernel: [ 92.327090] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Nov 13 18:39:58 ProjectX kernel: [ 95.599291] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000029 SMN_C2PMSG_82:0x00000000
Nov 13 18:39:58 ProjectX kernel: [ 95.599294] amdgpu 0000:03:00.0: amdgpu: [SetDfCstate] failed!
Nov 13 18:39:58 ProjectX kernel: [ 95.599295] amdgpu 0000:03:00.0: amdgpu: Failed to disallow df cstate
Nov 13 18:40:06 ProjectX kernel: [ 104.527248] [drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x0) |
Not sure if I'm still missing something at kernel level: https://paste.gentoo.zip/txTsPEdN |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5174 Location: Bavaria
|
Posted: Wed Nov 13, 2024 6:14 pm Post subject: |
|
|
I know that AMD is constantly working on their drivers ... what about if you are using kernel version 6.11 ?
(don't be surprised if the AMD IOMMU version 2 driver no longer exists; it was removed with 6.7). _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|