View previous topic :: View next topic |
Author |
Message |
orsetto n00b
Joined: 26 Sep 2023 Posts: 6 Location: Italy
|
Posted: Sun Sep 29, 2024 5:31 pm Post subject: amdgpu crash while suspending to ram/disk or resuming |
|
|
Hi. I've been fighting with my system for a while now.
I have an HP Pavilion 15, with a Ryzen 2500U and Vega 8 integrated graphic, and I'm running Plasma 6 on X11.
Basically my system hangs during suspend to disk/ram or resume. The exact behavior depends on different factors:
- if running plasma, the system hangs during suspend;
- if running just the X server, it hangs while resuming;
- if running with just a TTY, everything works as expected.*
I should note that the same problems happens with both X11 and Wayland, and also that this results aren't the rule: I've been able to correctly suspend even from plasma a couple of times, but I've been unable to reproduce this.
When I'm able to correctly suspend, after the system hangs in a failed resume state, I'm able to ssh into the system. This way I was able to gather the output of dmesg, that pointed me toward amdgpu crashes.
Searching on internet for the errors in dmesg gave me a bunch of solutions, none of which applied to my case, like putting iommu=pt in the kernel command line. I've also tried disabling AMD_IOMMU in the kernel config, without any luck.
While debugging this I tried every combination of the following kernel/firmware, without ever being able to suspend and resume correctly: Kernel 6.1.111, 6.5.52, 6.11.0, git (latest), and linux-firmware 20240709-r2, 20240909-r1, latest git, old git. I tried "old git" after looking at this thread: https://gitlab.freedesktop.org/drm/amd/-/issues/3539. Just for clarity, I'm daily driving sys-kernel/gentoo-sources-6.6.52 with sys-kernel/linux-firmware-20240909-r1.
One of the dmesg output that i obtained: https://pastebin.com/GRftGshx
Kernel config: https://pastebin.com/EUpkvhS2
Code: | $ emerge --info
Portage 3.0.65 (python 3.12.6-final-0, default/linux/amd64/23.0/split-usr/desktop/plasma, gcc-13, glibc-2.39-r6, 6.6.52-gentoo-orsetto x86_64)
=================================================================
System uname: Linux-6.6.52-gentoo-orsetto-x86_64-AMD_Ryzen_5_2500U_with_Radeon_Vega_Mobile_Gfx-with-glibc2.39
KiB Mem: 7003564 total, 1855768 free
KiB Swap: 16777212 total, 16777212 free
Timestamp of repository gentoo: Fri, 27 Sep 2024 22:30:00 +0000
Head commit of repository gentoo: 56840d6a1dc4bf2c906112739bdc0ccd53db44e8
Timestamp of repository brother-overlay: Fri, 27 Sep 2024 05:36:50 +0000
Head commit of repository brother-overlay: d369c4654473d84e1f292e03bdbe192f0c180240
Timestamp of repository eclipse: Sat, 20 Jan 2024 10:19:57 +0000
Head commit of repository eclipse: 292c22692558bbd73e4bc6bcb5afab1019f68c1f
Timestamp of repository guru: Fri, 27 Sep 2024 17:36:13 +0000
Head commit of repository guru: 4ca4cb23f92de6eda055a758722c5bca5a520e62
sh bash 5.2_p26-r6
ld GNU ld (Gentoo 2.42 p3) 2.42.0
app-misc/pax-utils: 1.3.7::gentoo
app-shells/bash: 5.2_p26-r6::gentoo
dev-build/autoconf: 2.13-r8::gentoo, 2.71-r7::gentoo
dev-build/automake: 1.16.5-r2::gentoo
dev-build/cmake: 3.30.2::gentoo
dev-build/libtool: 2.4.7-r4::gentoo
dev-build/make: 4.4.1-r1::gentoo
dev-build/meson: 1.5.1::gentoo
dev-java/java-config: 2.3.4::gentoo
dev-lang/perl: 5.40.0::gentoo
dev-lang/python: 3.11.10_p1::gentoo, 3.12.6_p2::gentoo
dev-lang/rust-bin: 1.80.1::gentoo
sys-apps/baselayout: 2.15::gentoo
sys-apps/openrc: 0.54.2::gentoo
sys-apps/sandbox: 2.39::gentoo
sys-devel/binutils: 2.42-r1::gentoo
sys-devel/binutils-config: 5.5.2::gentoo
sys-devel/clang: 18.1.8::gentoo
sys-devel/gcc: 13.3.1_p20240614::gentoo
sys-devel/gcc-config: 2.11::gentoo
sys-devel/lld: 18.1.8::gentoo
sys-devel/llvm: 18.1.8-r1::gentoo
sys-kernel/linux-headers: 6.6-r1::gentoo (virtual/os-headers)
sys-libs/glibc: 2.39-r6::gentoo
Repositories:
gentoo
location: /var/db/repos/gentoo
sync-type: rsync
sync-uri: rsync://rsync.gentoo.org/gentoo-portage
priority: -1000
volatile: False
sync-rsync-verify-metamanifest: yes
sync-rsync-verify-jobs: 1
sync-rsync-extra-opts:
sync-rsync-verify-max-age: 24
brother-overlay
location: /var/db/repos/brother-overlay
sync-type: git
sync-uri: https://github.com/gentoo-mirror/brother-overlay.git
masters: gentoo
volatile: False
eclipse
location: /var/db/repos/eclipse
sync-type: git
sync-uri: https://github.com/gentoo-mirror/eclipse.git
masters: gentoo
volatile: False
guru
location: /var/db/repos/guru
sync-type: git
sync-uri: https://github.com/gentoo-mirror/guru.git
masters: gentoo
volatile: False
crossdev
location: /var/db/repos/portage-crossdev
masters: gentoo
priority: 10
volatile: False
ACCEPT_KEYWORDS="amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-march=native -O2 -pipe -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -fcf-protection -fstack-clash-protection -fstack-protector-strong"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/lib64/libreoffice/program/sofficerc /usr/share/config /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/dconf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo /etc/texmf/language.dat.d /etc/texmf/language.def.d /etc/texmf/updmap.d /etc/texmf/web2c"
CXXFLAGS="-march=native -O2 -pipe -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -fcf-protection -fstack-clash-protection -fstack-protector-strong"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-march=native -O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync merge-wait multilib-strict network-sandbox news parallel-fetch pid-sandbox pkgdir-index-trusted preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms splitdebug strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-march=native -O2 -pipe"
GENTOO_MIRRORS="https://mirror.eu.oneandone.net/linux/distributions/gentoo/gentoo/ https://ftp.agdsn.de/gentoo https://ftp.spline.inf.fu-berlin.de/mirrors/gentoo/ https://ftp.gwdg.de/pub/linux/gentoo/ https://ftp.uni-hannover.de/gentoo/ https://mirror.netcologne.de/gentoo/ https://mirror.netzwerge.de/gentoo/ https://packages.hs-regensburg.de/gentoo-distfiles/ https://gentoo.mirror.garr.it/ https://mirror.init7.net/gentoo/"
LANG="en_US.utf8"
LDFLAGS="-Wl,-z,now -Wl,-z,relro"
LEX="flex"
MAKEOPTS="-j9 -l9"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
RUSTFLAGS="-C target-cpu=native -C link-arg=-Wl,-z,pack-relative-relocs -C opt-level=3"
SHELL="/bin/bash"
USE="X a52 aac acl acpi activities alsa amd64 bluetooth branding bzip2 cairo cdda cdr cet crypt cups dbus declarative dri dts dvd dvdr elogind encode exif flac gdbm gif gpm grub gtk gui hardened iconv icu ipv6 jpeg kde kf6compat kwallet lcms libnotify libtirpc mad mng modules-sign mp3 mp4 mpeg multilib ncurses networkmanager nls ogg opengl openmp pam pango pcre pdf pipewire plasma png policykit ppds pulseaudio qml qt5 qt6 readline screencast sdl seccomp semantic-desktop sound spell split-usr ssl startup-notification svg test-rust tiff truetype udev udisks unicode upower usb verify-sig vorbis vulkan wayland widgets wxwidgets x264 xattr xcb xft xml xv xvid zlib" ABI_X86="64" ADA_TARGET="gcc_12" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_anon authn_dbm authn_file authz_dbm authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir env expires ext_filter file_cache filter headers include info log_config logio mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2 aes avx avx2 f16c fma3 pclmul popcnt rdrand sha sse3 sse4_1 sse4_2 sse4a ssse3" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 ntrip navcom oceanserver oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 tsip tripmate tnt ublox" GUILE_SINGLE_TARGET="3-0" GUILE_TARGETS="3-0" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz glk hd44780 lb216 lcdm001 mtxorb text" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-2" POSTGRES_TARGETS="postgres16" PYTHON_SINGLE_TARGET="python3_12" PYTHON_TARGETS="python3_12" RUBY_TARGETS="ruby31 ruby32" VIDEO_CARDS="amdgpu radeonsi radeon" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipp2p iface geoip fuzzy condition tarpit sysrq proto logmark ipmark dhcpmac delude chaos account"
Unset: ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, PYTHONPATH, RANLIB, READELF, SIZE, STRINGS, STRIP, YACC, YFLAGS
|
make.conf: Code: | COMMON_FLAGS="-march=native -O2 -pipe"
# As of https://wiki.gentoo.org/wiki/GCC_optimization#Hardening_optimizations
# PIE is enabled by default when it is safe to do so on in 17.0 profiles and newer[3].
# PIC may also be enabled by default on executables for architectures that require it (like AMD64).
# There is no need to set PIE or PIC manually in CFLAGS.
C_COMMON_FLAGS="${COMMON_FLAGS} -D_FORTIFY_SOURCE=3 -D_GLIBCXX_ASSERTIONS -fcf-protection -fstack-clash-protection -fstack-protector-strong"
CFLAGS="${C_COMMON_FLAGS}"
CXXFLAGS="${C_COMMON_FLAGS}"
F_COMMON_FLAGS="${COMMON_FLAGS}"
FCFLAGS="${F_COMMON_FLAGS}"
FFLAGS="${F_COMMON_FLAGS}"
LDFLAGS="-Wl,-z,now -Wl,-z,relro"
RUSTFLAGS="-C target-cpu=native -C link-arg=-Wl,-z,pack-relative-relocs -C opt-level=3"
# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C.utf8
GENTOO_MIRRORS="https://mirror.eu.oneandone.net/linux/distributions/gentoo/gentoo/ \
https://ftp.agdsn.de/gentoo \
https://ftp.spline.inf.fu-berlin.de/mirrors/gentoo/ \
https://ftp.gwdg.de/pub/linux/gentoo/ \
https://ftp.uni-hannover.de/gentoo/ \
https://mirror.netcologne.de/gentoo/ \
https://mirror.netzwerge.de/gentoo/ \
https://packages.hs-regensburg.de/gentoo-distfiles/ \
https://gentoo.mirror.garr.it/ \
https://mirror.init7.net/gentoo/"
VIDEO_CARDS="amdgpu radeonsi radeon"
INPUT_DEVICE="libinput"
MAKEOPTS="-j9 -l9"
USE="hardened bluetooth grub verify-sig modules-sign -systemd"
FEATURES="splitdebug"
|
I should also note that root and swap partitions reside in a LUKS encrypted LVM partition, that I unlock at boot.
Does anyone have any idea what else I can try to solve this issue? being unable to suspend is kind of a PITA
* "as expected" is still not the same as "working". Ever since I installed gentoo the system crashes if I try to hibernate more than two times in a row, this however appens even from TTY, so I'm pretty sure it is a different problem than what Ihave now. |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5116 Location: Bavaria
|
Posted: Sun Sep 29, 2024 10:30 pm Post subject: |
|
|
First of all: I am not an AMD expert ... and even though I have some suggestions for your kernel configuration, they may not help. If anything, this could possibly be the deciding factor:
Code: | CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_AMD_PSTATE is not set |
(You need the oposite configuration)
I see you are using a self-created initramfs ... would be interesting to see what it does.
Code: | CONFIG_INITRAMFS_SOURCE="/usr/src/initramfs" |
Otherwise some comments / I would change this:
Code: | 1.
CONFIG_NR_CPUS=8192
2.
CONFIG_INTEL_MEI=m
3.
CONFIG_NVME_CORE=y
CONFIG_BLK_DEV_SD=m
CONFIG_SATA_AHCI=y
4.
# CONFIG_PINCTRL_AMD is not set
CONFIG_PINCTRL_INTEL=m
5.
CONFIG_MFD_INTEL_LPSS=m
CONFIG_MFD_INTEL_LPSS_ACPI=m
CONFIG_MFD_INTEL_LPSS_PCI=m
6.
# CONFIG_SECURITY_LANDLOCK is not set
CONFIG_IMA=y
CONFIG_IMA_APPRAISE=y |
1. Change it to 16 because: [ 0.029375] smpboot: Allowing 16 CPUs, 8 hotplug CPUs
2. MEI exits only in Intel CPUs
3. NVME and SATA is enabled statically into the kernel (good) but not SD (bad). Enable SD also statically and not as module.
4. You need the oposite configuration
5. Really useless
6. Do you really use SELinux AND ... IMA ?? ... on the other hand, the useful "Landlock" is not activated. (I use IMA and it is tricky to run it in Appraise mode once it is activated) _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
orsetto n00b
Joined: 26 Sep 2023 Posts: 6 Location: Italy
|
Posted: Mon Sep 30, 2024 11:00 am Post subject: |
|
|
pietinger wrote: | If anything, this could possibly be the deciding factor:
Code: | CONFIG_X86_INTEL_PSTATE=y
# CONFIG_X86_AMD_PSTATE is not set |
(You need the oposite configuration) |
Yep, this makes sense. I've changed it and am now compiling. Out of curiosity tho, I did `make defconfig`, and the default for linux-6.6.52 is what I have in my config. I think this is because my CPU is Zen architecture, while the docs report that
Quote: | Currently, ``amd-pstate`` supports basic
frequency control function according to kernel governors on some of the
Zen2 and Zen3 processors |
pietinger wrote: | I see you are using a self-created initramfs ... would be interesting to see what it does. |
Nothing special, I mostly copied what the wiki said:
Code: | #!/bin/busybox sh
function info_msg() {
printf -- "\e[01;33m -> \e[01;37m$*\e[00m\n"
}
function bad_msg() {
printf -- "\e[01;31m -> $*\e[00m\n"
}
function good_msg() {
printf -- "\e[01;32m -> \e[01;37m$*\e[00m\n"
}
function rescue_shell() {
bad_msg "There's been an oopsie: $*\nDropping you to a shell"
setsid sh -c 'exec sh </dev/tty1 >/dev/tty1 2>&1'
}
VERSION="0.7"
# Defaults
CRYPT_ROOT_OPTIONS="--perf-no_read_workqueue --perf-no_write_workqueue --allow-discards"
KEYMAP="/lib/keymaps/it.bmap"
# Mount the /proc and /sys filesystems.
mount -t proc none /proc || rescue_shell "Mount /proc failed."
mount -t sysfs none /sys || rescue_shell "Mount /sys failed."
mount -t devtmpfs none /dev || rescue_shell "Mount /dev failed."
# Suppress kernel logging to stdout
echo 0 > /proc/sys/kernel/printk
# Handle kernel command-line parameters
CMDLINE=$(cat /proc/cmdline 2>/dev/null)
for x in ${CMDLINE}
do
case "${x}" in
root=*)
FAKE_ROOT=${x#*=}
;;
shell)
SHELL_REQUESTED="yes"
;;
crypt_root=*)
CRYPT_ROOT=${x#*=}
;;
crypt_root_options+=*)
CRYPT_ROOT_OPTIONS=$(echo ${CRYPT_ROOT_OPTIONS} ${x#*=} | sed -e 's/,/ /g')
;;
crypt_root_options=*)
CRYPT_ROOT_OPTIONS=$(echo ${x#*=} | sed -e 's/,-/ -/g')
;;
real_resume=*|resume=*)
REAL_RESUME=${x#*=}
;;
keymap=*)
keymap_file="/lib/keymaps/${x#*=}.bmap"
if [ ! -f "${keymap_file}" ]; then
bad_msg "Keymap file not found: ${keymap_file}. Falling back to default."
continue;
fi
KEYMAP="${keymap_file}"
;;
esac
done
printf "\e[01;34m.:| orsetto's initramfs \e[01;32mv${VERSION}\e[01;34m. ~}>\e[00m\n"
# To add other keymaps, find it in /usr/share/keymaps and do
# gzip --decompress --stdout ${keymap}.map.gz \
# | loadkeys -b > /usr/src/initramfs/lib/keymaps/"${keymap}".bmap
info_msg "Loading keymap '${KEYMAP}'."
loadkmap < "${KEYMAP}" || rescue_shell "Loading the keymap has failed."
[ "${SHELL_REQUESTED}" = "yes" ] \
&& setsid sh -c 'exec sh </dev/tty1 >/dev/tty1 2>&1' \
|| info_msg "If you need a shell, put 'shell' in the kernel cmdline."
# unlock the rootfs
LUKS_PARTITION="$(findfs ${CRYPT_ROOT})"
info_msg "Using cryptsetup options: ${CRYPT_ROOT_OPTIONS} to unlock ${LUKS_PARTITION}."
cryptsetup ${CRYPT_ROOT_OPTIONS} luksOpen ${LUKS_PARTITION} root || rescue_shell "Decrypting the device has failed."
good_msg "Successfully unlocked luks device ${LUKS_PARTITION}."
# Activate the volume group
lvm vgscan --mknodes >/dev/null || rescue_shell "first vgscan failed."
lvm lvchange --sysinit -a y crypt_lvm || rescue_shell "Activating lvm failed."
lvm vgscan --mknodes >/dev/null || rescue_shell "second vgscan failed."
# Attempt to resume from hibernation
if [ -f /sys/power/resume ]
then
device=$(findfs ${REAL_RESUME})
if [ ! -b "${device}" ]; then
bad_msg "Can't find resume device. Continuing normal boot, but something is probably fucked."
continue
fi
min_maj=$(ls -lL "${device}" | sed 's/\ */ /g' | cut -d \ -f 5-6 | sed 's/,\ */:/')
info_msg "Attempting to resume from hibernation, using device '${device}'"
echo "${min_maj}" > /sys/power/resume || rescue_shell "Resuming failed. Come on, fix the hibernation issue you've been having for months."
fi
ROOT_DEVICE=$(findfs ${FAKE_ROOT})
[ ! -b "${ROOT_DEVICE}" ] && rescue_shell "Can't find root device for ${FAKE_ROOT}. Something's wrong."
# Mount the root filesystem.
info_msg "Mounting the root filesystem"
mount -t ext4 -o ro ${ROOT_DEVICE} /mnt/root || rescue_shell "Mounting root failed. What did you do??"
# Re-enablieng kernel log to stdout
echo 1 > /proc/sys/kernel/printk
# Clean up.
umount /proc
umount /sys
umount /dev
# Boot the real thing.
good_msg "Booting up the system."
exec switch_root /mnt/root /sbin/init
bad_msg "Not anymore, something orrible happened. Powering off the system, what else can we do. Reboot with 'shell' in the boot parameters and pray."
poweroff -f
|
For your other comments, I mostly left the defaults, except that I aimed at having a module free system after boot. I don't have a good answer to why some things are out of place tho.
Regarding SELinux and IMA, I don't remember ever touching any of those. I never took the time to actually understand how SELinux works and to be completely honest I never heard about IMA. I reverted to the defaults. (IMA disabled, landlock and SELinux enabled)
I'll let you know how it's going ASAP |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5116 Location: Bavaria
|
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5116 Location: Bavaria
|
Posted: Mon Sep 30, 2024 1:02 pm Post subject: |
|
|
P.S.: To be completely honest, I know that AMD changes a lot in every new major kernel version for their P-State and GPU modules. You may really have to wait for newer kernel versions... _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2434
|
Posted: Mon Sep 30, 2024 1:49 pm Post subject: |
|
|
@pietinger,
This is not a configuration issue. Everybody is affected in one way or another, depending on the setup they have. One bug I've found is https://gitlab.freedesktop.org/drm/amd/-/issues/3208
For example I cannot resume from S3 sleep reliably on battery. It works when plugged in. All kernels beyond 6.1.91 consistently fail on battery and more often than not when plugged in. 6.1.91 at least works fine when plugged in. I tried 6.11 and it acts even funnier. It sometimes gets to show picture but it's frozen. Once resume gets stuck the only way out is magic SysRq and somehow blindly logging in on an FB console and issuing a reboot command. And it doesn't reboot immediately. I guess systemd is still waiting for a service to finish resuming or something.
In short they've "fixed" amdgpu and as a result suspend/resume is broken. I tried to pinpoint the commit which broke it but it turned out to be way further back then I thought, so I gave up.
Best Regards,
Georgi |
|
Back to top |
|
|
orsetto n00b
Joined: 26 Sep 2023 Posts: 6 Location: Italy
|
Posted: Mon Sep 30, 2024 1:51 pm Post subject: |
|
|
No luck, I'm still not able to suspend.
Well this is definitely good to know. Also nice tutorials you have on your profile regarding kernel configuration, I'm pretty sure I'll read through those.
pietinger wrote: | P.S.: To be completely honest, I know that AMD changes a lot in every new major kernel version for their P-State and GPU modules. You may really have to wait for newer kernel versions... |
Honestly I think the problem is in the config(s). I didn't have this before and it's still there with old kernels. Also all the amdgpu errors I have in the output of dmesg were not there.
Now that i think about it, there is something I did not think about: going through /var/log/syslog* checking for the last successful hibernation attempt and then going through /var/log/emerge to see what I've updated since. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2434
|
Posted: Mon Sep 30, 2024 1:56 pm Post subject: |
|
|
orsetto wrote: |
Now that i think about it, there is something I did not think about: going through /var/log/syslog* checking for the last successful hibernation attempt and then going through /var/log/emerge to see what I've updated since. |
The kernel. See my above replay. I'll be very thankful if you can find a working version.
Best Regards,
Georgi |
|
Back to top |
|
|
orsetto n00b
Joined: 26 Sep 2023 Posts: 6 Location: Italy
|
Posted: Mon Sep 30, 2024 2:22 pm Post subject: |
|
|
logrusx wrote: |
The kernel. See my above replay. I'll be very thankful if you can find a working version.
Best Regards,
Georgi |
Sorry, I didn't see your reply. Unfortunately tho, I've been on 6.6 since february, but (looking at the logs) I've only had this problem for the last couple of weeks. However, as you said, it depends on one's setup.
I'm going through the logs now to see if there's some hint at what's wrong. If I can't find anything, I'll try with linux-6.1.91 |
|
Back to top |
|
|
Ralphred l33t
Joined: 31 Dec 2013 Posts: 653
|
Posted: Mon Sep 30, 2024 3:15 pm Post subject: |
|
|
orsetto wrote: | I've only had this problem for the last couple of weeks. |
Install-kernel hasn't dropped some ucode image in front of your initrd has it? That is some "new behaviour" I've had to mute in the last couple of weeks. |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5116 Location: Bavaria
|
Posted: Mon Sep 30, 2024 5:53 pm Post subject: |
|
|
logrusx wrote: | This is not a configuration issue. [...] |
Yes, I also suspected that it was more of a problem with the AMDGPU module itself (but I wasn't sure, that's why I wrote that you might have to wait for new versions). Thank you for your confirmation. _________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2434
|
Posted: Mon Sep 30, 2024 6:17 pm Post subject: |
|
|
pietinger wrote: | logrusx wrote: | This is not a configuration issue. [...] |
Yes, I also suspected that it was more of a problem with the AMDGPU module itself (but I wasn't sure, that's why I wrote that you might have to wait for new versions). Thank you for your confirmation. |
I'm afraid new versions won't solve the issue. They've said they won't in the bug I linked.
Best Regards,
Georgi |
|
Back to top |
|
|
orsetto n00b
Joined: 26 Sep 2023 Posts: 6 Location: Italy
|
Posted: Tue Oct 01, 2024 3:20 pm Post subject: |
|
|
so, i checked all the packages that I updated. The only interesting thing that i hadn't already checked was x11-libs/libdrm which has been updated from 2.4.122 to 2.4.122-r1. Unfortunately this only adds the "doc" use flag. here's the commit: https://gitweb.gentoo.org/repo/gentoo.git/commit/x11-libs/libdrm?id=3a9bc2c57733e0fd19f4414deb2122605819c311.
I also tried linux-6.6.91, and even if the problem persists, I get a different output from dmesg, which only shows this error: Code: | [ 79.375083] amdgpu 0000:04:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:222 vmid:1 pasid:0, for process pid 0 thread pid 0)
[ 79.375097] amdgpu 0000:04:00.0: amdgpu: in page starting at address 0x0000800000001000 from IH client 0x1b (UTCL2)
[ 79.375113] amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001009BC
[ 79.375117] amdgpu 0000:04:00.0: amdgpu: Faulty UTCL2 client ID: CPF (0x4)
[ 79.375122] amdgpu 0000:04:00.0: amdgpu: MORE_FAULTS: 0x0
[ 79.375126] amdgpu 0000:04:00.0: amdgpu: WALKER_ERROR: 0x6
[ 79.375130] amdgpu 0000:04:00.0: amdgpu: PERMISSION_FAULTS: 0xb
[ 79.375134] amdgpu 0000:04:00.0: amdgpu: MAPPING_ERROR: 0x1
[ 79.375138] amdgpu 0000:04:00.0: amdgpu: RW: 0x0
[ 79.375226] [drm] kiq ring mec 2 pipe 1 q 0
[ 79.701310] amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper] *ERROR* ring gfx test failed (-110)
[ 79.701323] [drm:amdgpu_device_ip_resume_phase2] *ERROR* resume of IP block <gfx_v9_0> failed -110
[ 79.701333] amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
[ 79.701338] amdgpu 0000:04:00.0: PM: dpm_run_callback(): pci_pm_restore+0x0/0x120 returns -110
[ 79.701356] amdgpu 0000:04:00.0: PM: failed to restore async: error -110 |
Here's the full output: https://pastebin.com/NqkpRDCE
I also do not think this is the same bug that logrusx linked, because the errors are different, I also tried the patch suggested there but there was no difference. I think I'll post my dmesg output there anyway because I might very well be wrong.
Ralphred wrote: | Install-kernel hasn't dropped some ucode image in front of your initrd has it? That is some "new behaviour" I've had to mute in the last couple of weeks. |
I'm not using install-kernel, so that can't be the problem for me :/
logrusx wrote: | I'm afraid new versions won't solve the issue. They've said they won't in the bug I linked. |
I might have missed it, but where are they saying this? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2434
|
Posted: Wed Oct 02, 2024 2:29 pm Post subject: |
|
|
orsetto wrote: |
I also tried linux-6.6.91, and even if the problem persists, I get a different output from dmesg |
The message is different with 6.1.92. It's also different in 6.6. and 6.10.
logrusx wrote: | I'm afraid new versions won't solve the issue. They've said they won't in the bug I linked. |
I might have missed it, but where are they saying this?[/quote]
This is my take of that bug. Things like "lets help you debug your s2idle", "we cannot support that" and so on. All other bugs are just like this one or similar. Judging by the commit that introduced it, the patch you've tried is only relevant to discrete GPU's, but who knows, it's a big mess in AMDGPU driver.
Best Regards,
Georgi |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|