Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
amdgpu crash while suspending to ram/disk or resuming
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
orsetto
n00b
n00b


Joined: 26 Sep 2023
Posts: 6
Location: Italy

PostPosted: Sun Sep 29, 2024 5:31 pm    Post subject: amdgpu crash while suspending to ram/disk or resuming Reply with quote

Hi. I've been fighting with my system for a while now.

I have an HP Pavilion 15, with a Ryzen 2500U and Vega 8 integrated graphic, and I'm running Plasma 6 on X11.

Basically my system hangs during suspend to disk/ram or resume. The exact behavior depends on different factors:

  • if running plasma, the system hangs during suspend;
  • if running just the X server, it hangs while resuming;
  • if running with just a TTY, everything works as expected.*

I should note that the same problems happens with both X11 and Wayland, and also that this results aren't the rule: I've been able to correctly suspend even from plasma a couple of times, but I've been unable to reproduce this.

When I'm able to correctly suspend, after the system hangs in a failed resume state, I'm able to ssh into the system. This way I was able to gather the output of dmesg, that pointed me toward amdgpu crashes.

Searching on internet for the errors in dmesg gave me a bunch of solutions, none of which applied to my case, like putting iommu=pt in the kernel command line. I've also tried disabling AMD_IOMMU in the kernel config, without any luck.

While debugging this I tried every combination of the following kernel/firmware, without ever being able to suspend and resume correctly: Kernel 6.1.111, 6.5.52, 6.11.0, git (latest), and linux-firmware 20240709-r2, 20240909-r1, latest git, old git. I tried "old git" after looking at this thread: https://gitlab.freedesktop.org/drm/amd/-/issues/3539. Just for clarity, I'm daily driving sys-kernel/gentoo-sources-6.6.52 with sys-kernel/linux-firmware-20240909-r1.

One of the dmesg output that i obtained: https://pastebin.com/GRftGshx
Kernel config: https://pastebin.com/EUpkvhS2

Code:
 $ emerge --info
Portage 3.0.65 (python 3.12.6-final-0, default/linux/amd64/23.0/split-usr/desktop/plasma, gcc-13, glibc-2.39-r6, 6.6.52-gentoo-orsetto x86_64)
=================================================================
System uname: Linux-6.6.52-gentoo-orsetto-x86_64-AMD_Ryzen_5_2500U_with_Radeon_Vega_Mobile_Gfx-with-glibc2.39
KiB Mem:     7003564 total,   1855768 free
KiB Swap:   16777212 total,  16777212 free
Timestamp of repository gentoo: Fri, 27 Sep 2024 22:30:00 +0000
Head commit of repository gentoo: 56840d6a1dc4bf2c906112739bdc0ccd53db44e8
Timestamp of repository brother-overlay: Fri, 27 Sep 2024 05:36:50 +0000
Head commit of repository brother-overlay: d369c4654473d84e1f292e03bdbe192f0c180240

Timestamp of repository eclipse: Sat, 20 Jan 2024 10:19:57 +0000
Head commit of repository eclipse: 292c22692558bbd73e4bc6bcb5afab1019f68c1f

Timestamp of repository guru: Fri, 27 Sep 2024 17:36:13 +0000
Head commit of repository guru: 4ca4cb23f92de6eda055a758722c5bca5a520e62

sh bash 5.2_p26-r6
ld GNU ld (Gentoo 2.42 p3) 2.42.0
app-misc/pax-utils:        1.3.7::gentoo
app-shells/bash:           5.2_p26-r6::gentoo
dev-build/autoconf:        2.13-r8::gentoo, 2.71-r7::gentoo
dev-build/automake:        1.16.5-r2::gentoo
dev-build/cmake:           3.30.2::gentoo
dev-build/libtool:         2.4.7-r4::gentoo
dev-build/make:            4.4.1-r1::gentoo
dev-build/meson:           1.5.1::gentoo
dev-java/java-config:      2.3.4::gentoo
dev-lang/perl:             5.40.0::gentoo
dev-lang/python:           3.11.10_p1::gentoo, 3.12.6_p2::gentoo
dev-lang/rust-bin:         1.80.1::gentoo
sys-apps/baselayout:       2.15::gentoo
sys-apps/openrc:           0.54.2::gentoo
sys-apps/sandbox:          2.39::gentoo
sys-devel/binutils:        2.42-r1::gentoo
sys-devel/binutils-config: 5.5.2::gentoo
sys-devel/clang:           18.1.8::gentoo
sys-devel/gcc:             13.3.1_p20240614::gentoo
sys-devel/gcc-config:      2.11::gentoo
sys-devel/lld:             18.1.8::gentoo
sys-devel/llvm:            18.1.8-r1::gentoo
sys-kernel/linux-headers:  6.6-r1::gentoo (virtual/os-headers)
sys-libs/glibc:            2.39-r6::gentoo
Repositories:

gentoo
    location: /var/db/repos/gentoo
    sync-type: rsync
    sync-uri: rsync://rsync.gentoo.org/gentoo-portage
    priority: -1000
    volatile: False
    sync-rsync-verify-metamanifest: yes
    sync-rsync-verify-jobs: 1
    sync-rsync-extra-opts:
    sync-rsync-verify-max-age: 24

brother-overlay
    location: /var/db/repos/brother-overlay
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/brother-overlay.git
    masters: gentoo
    volatile: False

eclipse
    location: /var/db/repos/eclipse
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/eclipse.git
    masters: gentoo
    volatile: False

guru
    location: /var/db/repos/guru
    sync-type: git
    sync-uri: https://github.com/gentoo-mirror/guru.git
    masters: gentoo
    volatile: False

crossdev
    location: /var/db/repos/portage-crossdev
    masters: gentoo
    priority: 10
    volatile: False

ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -fcf-protection -fstack-clash-protection -fstack-protector-strong"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -fcf-protection -fstack-clash-protection -fstack-protector-strong"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-march=native -O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O2 -pipe"
GENTOO_MIRRORS="https://mirror.eu.oneandone.net/linux/distributions/gentoo/gentoo/     https://ftp.agdsn.de/gentoo     https://ftp.spline.inf.fu-berlin.de/mirrors/gentoo/     https://ftp.gwdg.de/pub/linux/gentoo/     https://ftp.uni-hannover.de/gentoo/     https://mirror.netcologne.de/gentoo/     https://mirror.netzwerge.de/gentoo/     https://packages.hs-regensburg.de/gentoo-distfiles/     https://gentoo.mirror.garr.it/     https://mirror.init7.net/gentoo/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-z,now -Wl,-z,relro"
LEX="flex"
MAKEOPTS="-j9 -l9"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
RUSTFLAGS="-C target-cpu=native -C link-arg=-Wl,-z,pack-relative-relocs -C opt-level=3"
SHELL="/bin/bash"
USE="X a52 aac acl acpi activities alsa amd64 bluetooth branding bzip2 cairo cdda cdr cet crypt cups dbus declarative dri dts dvd dvdr elogind encode exif flac gdbm gif gpm grub gtk gui hardened iconv icu ipv6 jpeg kde kf6compat kwallet lcms libnotify libtirpc mad mng modules-sign mp3 mp4 mpeg multilib ncurses networkmanager nls ogg opengl openmp pam pango pcre pdf pipewire plasma png policykit ppds pulseaudio qml qt5 qt6 readline screencast sdl seccomp semantic-desktop sound spell split-usr ssl startup-notification svg test-rust tiff truetype udev udisks unicode upower usb verify-sig vorbis vulkan wayland widgets wxwidgets x264 xattr xcb xft xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gcc_12" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 aes avx avx2 f16c fma3 pclmul popcnt rdrand sha sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 ntrip navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" GUILE_SINGLE_TARGET="3-0" GUILE_TARGETS="3-0" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-2" POSTGRES_TARGETS="postgres16" PYTHON_SINGLE_TARGET="python3_12" PYTHON_TARGETS="python3_12" RUBY_TARGETS="ruby31 ruby32" VIDEO_CARDS="amdgpu radeonsi radeon" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset:  ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PYTHONPATH, RANLIB, READELF, SIZE, STRINGS, STRIP, YACC, YFLAGS


make.conf:
Code:
COMMON_FLAGS="-march=native -O2 -pipe"

# As of https://wiki.gentoo.org/wiki/GCC_optimization#Hardening_optimizations
#   PIE is enabled by default when it is safe to do so on in 17.0 profiles and newer[3].
#   PIC may also be enabled by default on executables for architectures that require it (like AMD64).
#   There is no need to set PIE or PIC manually in CFLAGS.
C_COMMON_FLAGS="${COMMON_FLAGS} -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -fcf-protection -fstack-clash-protection -fstack-protector-strong"
CFLAGS="${C_COMMON_FLAGS}"
CXXFLAGS="${C_COMMON_FLAGS}"

F_COMMON_FLAGS="${COMMON_FLAGS}"
FCFLAGS="${F_COMMON_FLAGS}"
FFLAGS="${F_COMMON_FLAGS}"

LDFLAGS="-Wl,-z,now -Wl,-z,relro"

RUSTFLAGS="-C target-cpu=native -C link-arg=-Wl,-z,pack-relative-relocs -C opt-level=3"

# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C.utf8

GENTOO_MIRRORS="https://mirror.eu.oneandone.net/linux/distributions/gentoo/gentoo/ \
    https://ftp.agdsn.de/gentoo \
    https://ftp.spline.inf.fu-berlin.de/mirrors/gentoo/ \
    https://ftp.gwdg.de/pub/linux/gentoo/ \
    https://ftp.uni-hannover.de/gentoo/ \
    https://mirror.netcologne.de/gentoo/ \
    https://mirror.netzwerge.de/gentoo/ \
    https://packages.hs-regensburg.de/gentoo-distfiles/ \
    https://gentoo.mirror.garr.it/ \
    https://mirror.init7.net/gentoo/"

VIDEO_CARDS="amdgpu radeonsi radeon"
INPUT_DEVICE="libinput"

MAKEOPTS="-j9 -l9"
USE="hardened bluetooth grub verify-sig modules-sign -systemd"

FEATURES="splitdebug"


I should also note that root and swap partitions reside in a LUKS encrypted LVM partition, that I unlock at boot.

Does anyone have any idea what else I can try to solve this issue? being unable to suspend is kind of a PITA

* "as expected" is still not the same as "working". Ever since I installed gentoo the system crashes if I try to hibernate more than two times in a row, this however appens even from TTY, so I'm pretty sure it is a different problem than what Ihave now.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5136
Location: Bavaria

PostPosted: Sun Sep 29, 2024 10:30 pm    Post subject: Reply with quote

First of all: I am not an AMD expert ... and even though I have some suggestions for your kernel configuration, they may not help. If anything, this could possibly be the deciding factor:
Code:
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_AMD_PSTATE is not set

(You need the oposite configuration)

I see you are using a self-created initramfs ... would be interesting to see what it does.
Code:
CONFIG_INITRAMFS_SOURCE="/usr/src/initramfs"

Otherwise some comments / I would change this:
Code:
1.
CONFIG_NR_CPUS=8192
2.
CONFIG_INTEL_MEI=m
3.
CONFIG_NVME_CORE=y
CONFIG_BLK_DEV_SD=m
CONFIG_SATA_AHCI=y
4.
# CONFIG_PINCTRL_AMD is not set
CONFIG_PINCTRL_INTEL=m
5.
CONFIG_MFD_INTEL_LPSS=m
CONFIG_MFD_INTEL_LPSS_ACPI=m
CONFIG_MFD_INTEL_LPSS_PCI=m
6.
# CONFIG_SECURITY_LANDLOCK is not set
CONFIG_IMA=y
CONFIG_IMA_APPRAISE=y

1. Change it to 16 because: [ 0.029375] smpboot: Allowing 16 CPUs, 8 hotplug CPUs
2. MEI exits only in Intel CPUs
3. NVME and SATA is enabled statically into the kernel (good) but not SD (bad). Enable SD also statically and not as module.
4. You need the oposite configuration
5. Really useless
6. Do you really use SELinux AND ... IMA ?? ... on the other hand, the useful "Landlock" is not activated. (I use IMA and it is tricky to run it in Appraise mode once it is activated)
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
orsetto
n00b
n00b


Joined: 26 Sep 2023
Posts: 6
Location: Italy

PostPosted: Mon Sep 30, 2024 11:00 am    Post subject: Reply with quote

pietinger wrote:
If anything, this could possibly be the deciding factor:
Code:
CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_AMD_PSTATE is not set

(You need the oposite configuration)

Yep, this makes sense. I've changed it and am now compiling. Out of curiosity tho, I did `make defconfig`, and the default for linux-6.6.52 is what I have in my config. I think this is because my CPU is Zen architecture, while the docs report that
Quote:
Currently, ``amd-pstate`` supports basic
frequency control function according to kernel governors on some of the
Zen2 and Zen3 processors


pietinger wrote:
I see you are using a self-created initramfs ... would be interesting to see what it does.

Nothing special, I mostly copied what the wiki said:
Code:
#!/bin/busybox sh

function info_msg() {
   printf -- "\e[01;33m -> \e[01;37m$*\e[00m\n"
}

function bad_msg() {
   printf -- "\e[01;31m -> $*\e[00m\n"
}

function good_msg() {
   printf -- "\e[01;32m -> \e[01;37m$*\e[00m\n"
}

function rescue_shell() {
   bad_msg "There's been an oopsie: $*\nDropping you to a shell"
   setsid sh -c 'exec sh </dev/tty1 >/dev/tty1 2>&1'
}

VERSION="0.7"

# Defaults
CRYPT_ROOT_OPTIONS="--perf-no_read_workqueue --perf-no_write_workqueue --allow-discards"
KEYMAP="/lib/keymaps/it.bmap"

# Mount the /proc and /sys filesystems.
mount -t proc none /proc || rescue_shell "Mount /proc failed."
mount -t sysfs none /sys || rescue_shell "Mount /sys failed."
mount -t devtmpfs none /dev || rescue_shell "Mount /dev failed."

# Suppress kernel logging to stdout
echo 0 > /proc/sys/kernel/printk

# Handle kernel command-line parameters
CMDLINE=$(cat /proc/cmdline 2>/dev/null)
for x in ${CMDLINE}
do
   case "${x}" in
      root=*)
         FAKE_ROOT=${x#*=}
      ;;
      shell)
         SHELL_REQUESTED="yes"
      ;;
      crypt_root=*)
         CRYPT_ROOT=${x#*=}
      ;;
      crypt_root_options+=*)
         CRYPT_ROOT_OPTIONS=$(echo ${CRYPT_ROOT_OPTIONS} ${x#*=} | sed -e 's/,/ /g')
      ;;
      crypt_root_options=*)
         CRYPT_ROOT_OPTIONS=$(echo ${x#*=} | sed -e 's/,-/ -/g')
      ;;
      real_resume=*|resume=*)
         REAL_RESUME=${x#*=}
      ;;
      keymap=*)
         keymap_file="/lib/keymaps/${x#*=}.bmap"
         if [ ! -f "${keymap_file}" ]; then
            bad_msg "Keymap file not found: ${keymap_file}. Falling back to default."
            continue;
         fi
         
         KEYMAP="${keymap_file}"
      ;;
   esac
done

printf "\e[01;34m.:| orsetto's initramfs \e[01;32mv${VERSION}\e[01;34m. ~}>\e[00m\n"

# To add other keymaps, find it in /usr/share/keymaps and do
# gzip --decompress --stdout ${keymap}.map.gz \
#     | loadkeys -b > /usr/src/initramfs/lib/keymaps/"${keymap}".bmap
info_msg "Loading keymap '${KEYMAP}'."
loadkmap < "${KEYMAP}" || rescue_shell "Loading the keymap has failed."

[ "${SHELL_REQUESTED}" = "yes" ] \
   && setsid sh -c 'exec sh </dev/tty1 >/dev/tty1 2>&1' \
   || info_msg "If you need a shell, put 'shell' in the kernel cmdline."

# unlock the rootfs
LUKS_PARTITION="$(findfs ${CRYPT_ROOT})"
info_msg "Using cryptsetup options: ${CRYPT_ROOT_OPTIONS} to unlock ${LUKS_PARTITION}."
cryptsetup ${CRYPT_ROOT_OPTIONS} luksOpen ${LUKS_PARTITION} root || rescue_shell "Decrypting the device has failed."

good_msg "Successfully unlocked luks device ${LUKS_PARTITION}."

# Activate the volume group
lvm vgscan --mknodes >/dev/null || rescue_shell "first vgscan failed."
lvm lvchange --sysinit -a y crypt_lvm || rescue_shell "Activating lvm failed."
lvm vgscan --mknodes >/dev/null || rescue_shell "second vgscan failed."

# Attempt to resume from hibernation
if [ -f /sys/power/resume ]
then
   device=$(findfs ${REAL_RESUME})
   if [ ! -b "${device}" ]; then
      bad_msg "Can't find resume device. Continuing normal boot, but something is probably fucked."
      continue
   fi
   min_maj=$(ls -lL "${device}" | sed 's/\  */ /g' | cut -d \  -f 5-6 | sed 's/,\ */:/')   
   
   info_msg "Attempting to resume from hibernation, using device '${device}'"
   echo "${min_maj}" > /sys/power/resume || rescue_shell "Resuming failed. Come on, fix the hibernation issue you've been having for months."
fi

ROOT_DEVICE=$(findfs ${FAKE_ROOT})
[ ! -b "${ROOT_DEVICE}" ] && rescue_shell "Can't find root device for ${FAKE_ROOT}. Something's wrong."

# Mount the root filesystem.
info_msg "Mounting the root filesystem"
mount -t ext4 -o ro ${ROOT_DEVICE} /mnt/root || rescue_shell "Mounting root failed. What did you do??"

# Re-enablieng kernel log to stdout
echo 1 > /proc/sys/kernel/printk

# Clean up.
umount /proc
umount /sys
umount /dev

# Boot the real thing.
good_msg "Booting up the system."
exec switch_root /mnt/root /sbin/init

bad_msg "Not anymore, something orrible happened. Powering off the system, what else can we do. Reboot with 'shell' in the boot parameters and pray."
poweroff -f


For your other comments, I mostly left the defaults, except that I aimed at having a module free system after boot. I don't have a good answer to why some things are out of place tho.

Regarding SELinux and IMA, I don't remember ever touching any of those. I never took the time to actually understand how SELinux works and to be completely honest I never heard about IMA. I reverted to the defaults. (IMA disabled, landlock and SELinux enabled)

I'll let you know how it's going ASAP
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5136
Location: Bavaria

PostPosted: Mon Sep 30, 2024 12:37 pm    Post subject: Reply with quote

orsetto wrote:
[...] Out of curiosity tho, I did `make defconfig`, and the default for linux-6.6.52 is what I have in my config. [...]

Maybe you are interested in: Do I need a "make defconfig" before I start ?
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5136
Location: Bavaria

PostPosted: Mon Sep 30, 2024 1:02 pm    Post subject: Reply with quote

P.S.: To be completely honest, I know that AMD changes a lot in every new major kernel version for their P-State and GPU modules. You may really have to wait for newer kernel versions... :(
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2452

PostPosted: Mon Sep 30, 2024 1:49 pm    Post subject: Reply with quote

@pietinger,

This is not a configuration issue. Everybody is affected in one way or another, depending on the setup they have. One bug I've found is https://gitlab.freedesktop.org/drm/amd/-/issues/3208

For example I cannot resume from S3 sleep reliably on battery. It works when plugged in. All kernels beyond 6.1.91 consistently fail on battery and more often than not when plugged in. 6.1.91 at least works fine when plugged in. I tried 6.11 and it acts even funnier. It sometimes gets to show picture but it's frozen. Once resume gets stuck the only way out is magic SysRq and somehow blindly logging in on an FB console and issuing a reboot command. And it doesn't reboot immediately. I guess systemd is still waiting for a service to finish resuming or something.

In short they've "fixed" amdgpu and as a result suspend/resume is broken. I tried to pinpoint the commit which broke it but it turned out to be way further back then I thought, so I gave up.

Best Regards,
Georgi
Back to top
View user's profile Send private message
orsetto
n00b
n00b


Joined: 26 Sep 2023
Posts: 6
Location: Italy

PostPosted: Mon Sep 30, 2024 1:51 pm    Post subject: Reply with quote

No luck, I'm still not able to suspend.

pietinger wrote:
Maybe you are interested in: Do I need a "make defconfig" before I start ?

Well this is definitely good to know. Also nice tutorials you have on your profile regarding kernel configuration, I'm pretty sure I'll read through those.

pietinger wrote:
P.S.: To be completely honest, I know that AMD changes a lot in every new major kernel version for their P-State and GPU modules. You may really have to wait for newer kernel versions... :(

Honestly I think the problem is in the config(s). I didn't have this before and it's still there with old kernels. Also all the amdgpu errors I have in the output of dmesg were not there.


Now that i think about it, there is something I did not think about: going through /var/log/syslog* checking for the last successful hibernation attempt and then going through /var/log/emerge to see what I've updated since.
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2452

PostPosted: Mon Sep 30, 2024 1:56 pm    Post subject: Reply with quote

orsetto wrote:

Now that i think about it, there is something I did not think about: going through /var/log/syslog* checking for the last successful hibernation attempt and then going through /var/log/emerge to see what I've updated since.


The kernel. See my above replay. I'll be very thankful if you can find a working version.

Best Regards,
Georgi
Back to top
View user's profile Send private message
orsetto
n00b
n00b


Joined: 26 Sep 2023
Posts: 6
Location: Italy

PostPosted: Mon Sep 30, 2024 2:22 pm    Post subject: Reply with quote

logrusx wrote:

The kernel. See my above replay. I'll be very thankful if you can find a working version.

Best Regards,
Georgi

Sorry, I didn't see your reply. Unfortunately tho, I've been on 6.6 since february, but (looking at the logs) I've only had this problem for the last couple of weeks. However, as you said, it depends on one's setup.

I'm going through the logs now to see if there's some hint at what's wrong. If I can't find anything, I'll try with linux-6.1.91
Back to top
View user's profile Send private message
Ralphred
l33t
l33t


Joined: 31 Dec 2013
Posts: 657

PostPosted: Mon Sep 30, 2024 3:15 pm    Post subject: Reply with quote

orsetto wrote:
I've only had this problem for the last couple of weeks.

Install-kernel hasn't dropped some ucode image in front of your initrd has it? That is some "new behaviour" I've had to mute in the last couple of weeks.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5136
Location: Bavaria

PostPosted: Mon Sep 30, 2024 5:53 pm    Post subject: Reply with quote

logrusx wrote:
This is not a configuration issue. [...]

Yes, I also suspected that it was more of a problem with the AMDGPU module itself (but I wasn't sure, that's why I wrote that you might have to wait for new versions). Thank you for your confirmation.
_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2452

PostPosted: Mon Sep 30, 2024 6:17 pm    Post subject: Reply with quote

pietinger wrote:
logrusx wrote:
This is not a configuration issue. [...]

Yes, I also suspected that it was more of a problem with the AMDGPU module itself (but I wasn't sure, that's why I wrote that you might have to wait for new versions). Thank you for your confirmation.


I'm afraid new versions won't solve the issue. They've said they won't in the bug I linked.

Best Regards,
Georgi
Back to top
View user's profile Send private message
orsetto
n00b
n00b


Joined: 26 Sep 2023
Posts: 6
Location: Italy

PostPosted: Tue Oct 01, 2024 3:20 pm    Post subject: Reply with quote

so, i checked all the packages that I updated. The only interesting thing that i hadn't already checked was x11-libs/libdrm which has been updated from 2.4.122 to 2.4.122-r1. Unfortunately this only adds the "doc" use flag. here's the commit: https://gitweb.gentoo.org/repo/gentoo.git/commit/x11-libs/libdrm?id=3a9bc2c57733e0fd19f4414deb2122605819c311.

I also tried linux-6.6.91, and even if the problem persists, I get a different output from dmesg, which only shows this error:
Code:
[   79.375083] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:222 vmid:1 pasid:0, for process  pid 0 thread  pid 0)
[   79.375097] amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800000001000 from IH client 0x1b (UTCL2)
[   79.375113] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001009BC
[   79.375117] amdgpu 0000:04:00.0: amdgpu:     Faulty UTCL2 client ID: CPF (0x4)
[   79.375122] amdgpu 0000:04:00.0: amdgpu:     MORE_FAULTS: 0x0
[   79.375126] amdgpu 0000:04:00.0: amdgpu:     WALKER_ERROR: 0x6
[   79.375130] amdgpu 0000:04:00.0: amdgpu:     PERMISSION_FAULTS: 0xb
[   79.375134] amdgpu 0000:04:00.0: amdgpu:     MAPPING_ERROR: 0x1
[   79.375138] amdgpu 0000:04:00.0: amdgpu:     RW: 0x0
[   79.375226] [drm] kiq ring mec 2 pipe 1 q 0
[   79.701310] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring gfx test failed (-110)
[   79.701323] [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <gfx_v9_0> failed -110
[   79.701333] amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
[   79.701338] amdgpu 0000:04:00.0: PM: dpm_run_callback(): pci_pm_restore+0x0/0x120 returns -110
[   79.701356] amdgpu 0000:04:00.0: PM: failed to restore async: error -110

Here's the full output: https://pastebin.com/NqkpRDCE

I also do not think this is the same bug that logrusx linked, because the errors are different, I also tried the patch suggested there but there was no difference. I think I'll post my dmesg output there anyway because I might very well be wrong.

Ralphred wrote:
Install-kernel hasn't dropped some ucode image in front of your initrd has it? That is some "new behaviour" I've had to mute in the last couple of weeks.

I'm not using install-kernel, so that can't be the problem for me :/

logrusx wrote:
I'm afraid new versions won't solve the issue. They've said they won't in the bug I linked.

I might have missed it, but where are they saying this?
Back to top
View user's profile Send private message
logrusx
Advocate
Advocate


Joined: 22 Feb 2018
Posts: 2452

PostPosted: Wed Oct 02, 2024 2:29 pm    Post subject: Reply with quote

orsetto wrote:

I also tried linux-6.6.91, and even if the problem persists, I get a different output from dmesg


The message is different with 6.1.92. It's also different in 6.6. and 6.10.

logrusx wrote:
I'm afraid new versions won't solve the issue. They've said they won't in the bug I linked.

I might have missed it, but where are they saying this?[/quote]

This is my take of that bug. Things like "lets help you debug your s2idle", "we cannot support that" and so on. All other bugs are just like this one or similar. Judging by the commit that introduced it, the patch you've tried is only relevant to discrete GPU's, but who knows, it's a big mess in AMDGPU driver.

Best Regards,
Georgi
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum