View previous topic :: View next topic |
Author |
Message |
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Fri Aug 30, 2024 7:17 pm Post subject: with nvidia drivers elogind suspend crashes/freezes X11 |
|
|
I just got a thinkpad P14S with intel iGPU and the following nvidia dGPU: NVIDIA RTX™ 500 Ada Generation Laptop GPU 4GB GDDR6
When the NVIDIA card is driven by the 'nvidia' driver, and I try to suspend the system using elogind, X11 crashes. I close the lid, and the power LED stays solid for 30 secconds, and then it starts to blink, indicating it has suspended. I open the lid and I'm in the tty: X11 has crashed. X11 sometimes just freezes totally, but I'm not sure what triggers this: I keep trying new things and it's one of these.
Suspend works fine from a tty.
Suspend also works fine from inside X11 if I do it by `echo "mem" > /sys/power/state`.
Any guidance is very much appreciated.
Last edited by sunox on Sat Aug 31, 2024 1:23 pm; edited 7 times in total |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Fri Aug 30, 2024 8:31 pm Post subject: |
|
|
Can skip: just leaving in case it helps anyone in the future
Hm, `xrandr --listproviders` only shows the intel gpu.
I also notice I'm getting `Failed to initialize the NVIDIA kernel module` errors when starting X. I must be configuring something wrong... It says to consult the kernel log, but dmesg doesn't show any NVIDIA related errors.
I know that xorg is supposed to autoconfigure prime in recent versions, but I also tried to generate a config with `doas nvidia-xconfig --prime`, and I also tried copy-pasted the top config at ttps://wiki.gentoo.org/wiki/NVIDIA/Optimus/xorg.conf. In both cases X11 crashes immediately after `startx`.
EDIT: I believe I had to add my user to the `video` group to get the nvidia card to show up as a provider.
Last edited by sunox on Sat Aug 31, 2024 12:31 am; edited 2 times in total |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Fri Aug 30, 2024 11:36 pm Post subject: |
|
|
Edit: I found this in my syslog. When the system freezes when I try to suspend I get the following printouts repeated:
https://bpa.st/WC3A
This is the last 20 lines of the xorg log after a crash:
https://bpa.st/HSOQ
I have tried using the suspend scripts that ship with the nvidia driver, along with NVreg_PreserveVideoMemoryAllocations=1 (which works on my desktop), but to no avail. |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Sat Aug 31, 2024 2:39 am Post subject: |
|
|
I didn't mention that I have been trying to suspend using `elogind` only (`loginctl suspend`).
If I suspend using `echo "mem" > /sys/power/class` it works as it should, including resume. So the issue seems to be between the nvidia drivers and elogind. |
|
Back to top |
|
|
sublogic Apprentice
Joined: 21 Mar 2022 Posts: 295 Location: Pennsylvania, USA
|
Posted: Sat Aug 31, 2024 9:41 pm Post subject: |
|
|
sunox wrote: | Suspend works fine from a tty. | From a tty while X is running ? I'll assume yes.
Here goes nothing: you can make elogind itself switch to a tty. Read the "Hook directories" paragraphs in the loginctl man page. You can place a shell script in /etc/elogind/system-sleep/ that does a chvt 15 if $1=="pre". After resuming, you'll still be on VT15, which probably has a blank screen. Maybe CTL-ALT-7 gets you back to a working X11 ? I'm not sure it will work because Quote: | Suspend also works fine from inside X11 if I do it by `echo "mem" > /sys/power/state`. | so, as you said, there is a weird interaction between the driver and elogind.
If it does work, and you want to switch back automatically on resume, your script's "pre" section should save the output of fgconsole to a file before switching VT. The "post" section can read the VT number from the file and chvt to the correct screen.
I do that on a laptop. For some reason I wrote "(chvt 15)" to run the chvt from a subshell. |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Sun Sep 01, 2024 2:41 pm Post subject: |
|
|
Thanks for the comment.
After reading this I was reminded that this is one of the things that the nvidia-sleep.sh script (pasted below) is supposed to do. I wonder if the script is exiting prematurely or something. Yesterday I got fed up and just disabled the nvidia card but I should try to do some experimentation later.
Code: |
XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number
PATH="/bin:/usr/bin"
case "$1" in
suspend|hibernate)
mkdir -p "${RUN_DIR}"
fgconsole > "${XORG_VT_FILE}"
chvt 63
if [[ $? -ne 0 ]]; then
exit $?
fi
echo "$1" > /proc/driver/nvidia/suspend
exit $?
;;
resume)
echo "$1" > /proc/driver/nvidia/suspend
#
# Check if Xorg was determined to be running at the time
# of suspend, and whether its VT was recorded. If so,
# attempt to switch back to this VT.
#
if [[ -f "${XORG_VT_FILE}" ]]; then
XORG_PID=$(cat "${XORG_VT_FILE}")
rm "${XORG_VT_FILE}"
chvt "${XORG_PID}"
fi
exit 0
;;
*)
exit 1
esac
|
|
|
Back to top |
|
|
sublogic Apprentice
Joined: 21 Mar 2022 Posts: 295 Location: Pennsylvania, USA
|
Posted: Sun Sep 01, 2024 11:08 pm Post subject: |
|
|
Check if there is already a script in /lib64/elogind/system-sleep/, installed by some package.
(Could also be /lib/, /usr/lib64/, /usr/lib/ .) |
|
Back to top |
|
|
Cosminovici n00b
Joined: 12 Aug 2024 Posts: 2
|
Posted: Thu Sep 19, 2024 7:59 pm Post subject: |
|
|
What you are experiencing is a bug within elogind:
https://bugs.gentoo.org/693384
https://github.com/elogind/elogind/issues/140
You may notice that the first bug report has been closed via an update to the nvidia-drivers ebuild, however I believe the issues that this resolves are unrelated to the actual bug people (or at least I) were experiencing. Nevertheless, the actual bug was fixed in elogind 252.23. The last stable version of elogind in the official gentoo repository (and therefore the one installed by default) is 246.10. You're going to need to add to a file in /etc/portage/package.accept_keywords/ and run to get the latest unstable version, 255.5, in which the bug is fixed. Despite being unstable, I have not experienced any issues with it so far, however the default suspend behaviour has changed from Suspend-to-RAM (which is probably what you want) to Suspend-To-Idle, which you can change by adding a file in /etc/elogind/sleep.conf.d/ with the contents
Code: | [Sleep]
SuspendMode=deep |
You should restart the elogind daemon after updating, or just reboot. Also, keep in mind that the custom suspend hooks directory has changed in this version, so you will need to relocate those if you have ever explicitly created any. Lastly, you may also need to set NVreg_PreserveVideoMemoryAllocations to 0 in /etc/modprobe.d/nvidia.conf. |
|
Back to top |
|
|
|