Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[~SOLVED] systemd 256 breaks resume from suspend w/ NVIDIA
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
eeckwrk99
Apprentice
Apprentice


Joined: 14 Mar 2021
Posts: 231
Location: Gentoo forums

PostPosted: Sun Nov 03, 2024 8:10 pm    Post subject: [~SOLVED] systemd 256 breaks resume from suspend w/ NVIDIA Reply with quote

GPU: GTX 970 + x11-drivers/nvidia-drivers-550.127.05

After updating from sys-apps/systemd-255.11 to sys-apps/systemd-256.7 which was stabilized yesterday, I couldn't resume from suspend. All I got was an unresponsive system (blank screen, unable to switch to any other TTY). Used to work just fine in the last few weeks.

I eventually found this bug report on the Debian bug tracker: #1072722 - nvidia-driver: Please configure SYSTEMD_SLEEP_FREEZE_USER_SESSION=false.

systemd changelog wrote:
* The behavior of systemd-sleep and systemd-homed has been updated to
freeze user sessions when entering the various sleep modes or when
locking a homed-managed home area. This is known to cause issues with
the proprietary NVIDIA drivers. Packagers of the NVIDIA proprietary
drivers may want to add drop-in configuration files that set
SYSTEMD_SLEEP_FREEZE_USER_SESSION=false for systemd-suspend.service
and related services, and SYSTEMD_HOME_LOCK_FREEZE_SESSION=false for
systemd-homed.service.



I've been able to resume from suspend just fine so far after applying the workaround (including two 1+ hour of suspend time periods):

Code:
% lsd --icon=never --tree /etc/systemd/system
system
├── systemd-hibernate.service.d
│   └── 10-nvidia-no-freeze-session.conf
├── systemd-homed.service.d
│   └── 10-nvidia-no-freeze-session.conf
├── systemd-hybrid-sleep.service.d
│   └── 10-nvidia-no-freeze-session.conf
├── systemd-suspend-then-hibernate.service.d
│   └── 10-nvidia-no-freeze-session.conf
├── systemd-suspend.service.d
│   └── 10-nvidia-no-freeze-session.conf


Code:
% cat /etc/systemd/system/systemd-suspend.service.d/10-nvidia-no-freeze-session.conf
[Service]
Environment="SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=false"


journalctl shows that SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=false is used:

Code:
% journalctl
systemd[1]: Starting System Suspend...
systemd-sleep[11527]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
systemd-sleep[11527]: This is not recommended, and might result in unexpected behavior, particularly
systemd-sleep[11527]: in suspend-then-hibernate operations or setups with encrypted home directories.
systemd-sleep[11527]: Performing sleep operation 'suspend'...



If that matters, I also have nvidia-{hibernate,resume,suspend} services unit enabled:
Code:
% systemctl is-enabled nvidia-{hibernate,resume,suspend}.service}
enabled
enabled
enabled


Content of /etc/modprobe.d/nvidia.conf:

Code:
# NVIDIA drivers options
# See /usr/share/doc/nvidia-drivers-*/README.txt* for more information.

# nvidia-drivers and nouveau cannot be used at same time.
# Comment out the following line if you wish to allow nouveau.
blacklist nouveau

# Kernel Mode Setting (notably needed for fbdev and wayland).
# Enabling may possibly cause issues with SLI and Reverse PRIME.
options nvidia-drm modeset=1

# Enable experimental framebuffer console support (requires modeset=1 above).
# Replaces efifb, simpledrm, or similar once loaded (emphasis on being
# experimental, "may" cause issues X mode switching, sleep, or more).
options nvidia-drm fbdev=1

# Suspend options. Note that Allocations=1 requires suspend hooks currently
# only used when either systemd or elogind is used to suspend. If using
# neither or have issues, try Allocations=0 (revert if it does not help
# as =0 is not recommended).
options nvidia \
    NVreg_PreserveVideoMemoryAllocations=1 \
    NVreg_TemporaryFilePath=/var/tmp

# !!! Security Warning !!!
# Do not change the DeviceFile options unless you know what you are doing.
# Only add trusted users to the 'video' group, these users may be able to
# crash, compromise, or irreparably damage the machine.
options nvidia \
    NVreg_DeviceFileGID=27 \
    NVreg_DeviceFileMode=432 \
    NVreg_DeviceFileUID=0 \
    NVreg_ModifyDeviceFiles=1

# Should be no need to touch anything below.
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia


I noticed that Arch Linux also applies this workaround via their nvidia-utils package:

- nvidia-utils PKGBUILD
- systemd-suspend-override.conf content

Other relevant links:

- GitHub - systemd/systemd - #33083 - user session fails to resume from suspend when user is using NFS or KVM
- Arch Linux forums - [SOLVED] System freezes after sleep since update

Just curious is someone also had the issue.

Should I open a bug so that this is also added? Or maybe a warning could be displayed instead after installing >=sys-apps/systemd-256.6?


Last edited by eeckwrk99 on Mon Nov 04, 2024 8:40 pm; edited 1 time in total
Back to top
View user's profile Send private message
Ionen
Developer
Developer


Joined: 06 Dec 2018
Posts: 2849

PostPosted: Mon Nov 04, 2024 12:29 am    Post subject: Reply with quote

I'm Gentoo's nvidia-drivers' maintainer and just happened to see this thread and... first I hear of this. Admit kind of surprised nobody filed a bug about this so far? At a glance systemd-256.6 been in ~testing for a while (formerly masked, and was unmasked on July 1, haven't checked history for other versions), but it took until 256.7 stable for someone to mention something (aka, this thread).

I recall back in September we were testing sleep in https://github.com/gentoo/gentoo/pull/38482 and as far as I know systemd users were fine (we had more problems with elogind), albeit I never confirmed which versions they were using (some could've been stable users).

Out of curiosity, could you drop the workarounds and try ~testing nvidia-drivers-565.x beta? It could explain things if nvidia fixed something on their end (probably since 560) and the issue is only in stable drivers. If broken too, then maybe it only affects some specific setups but I wouldn't know.

Either way, since systemd upstream is suggesting it, and Debian and Arch are already doing it, I don't see an issue with doing it here too, but if 565 is not affected I'd limit the workarounds to older versions.

I haven't tried reproducing (I don't really use sleep, and my nvidia system is still using openrc) but I could push this: https://termbin.com/bz6v

Adds the following files:
Code:
/usr/lib/systemd/system/systemd-hibernate.service.d/10-nvidia.conf
/usr/lib/systemd/system/systemd-homed.service.d/10-nvidia.conf
/usr/lib/systemd/system/systemd-hybrid-sleep.service.d/10-nvidia.conf
/usr/lib/systemd/system/systemd-suspend-then-hibernate.service.d/10-nvidia.conf
/usr/lib/systemd/system/systemd-suspend.service.d/10-nvidia.conf

Most of these are symlinks to suspend.service.d's nvidia.conf, except homed.service.d:
Code:
$ cat /usr/lib/systemd/system/systemd-homed.service.d/10-nvidia.conf
[Service]
Environment=SYSTEMD_HOME_LOCK_FREEZE_SESSION=false

$ cat /usr/lib/systemd/system/systemd-suspend.service.d/10-nvidia.conf
[Service]
Environment=SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=false

Does this seem correct to you? Optionally, if familiar with doing this (don't worry if not), could cleanup your workarounds and emerge nvidia-drivers using the commit in the earlier termbin paste to be sure.
Back to top
View user's profile Send private message
eeckwrk99
Apprentice
Apprentice


Joined: 14 Mar 2021
Posts: 231
Location: Gentoo forums

PostPosted: Mon Nov 04, 2024 9:30 am    Post subject: Reply with quote

Hi lonen, thanks for the reply!

lonen wrote:
Out of curiosity, could you drop the workarounds and try ~testing nvidia-drivers-565.x beta?

Sure, I'll try and report back.

lonen wrote:
Does this seem correct to you?

Yes, the changes look good to me.

lonen wrote:
Optionally, if familiar with doing this (don't worry if not), could cleanup your workarounds and emerge nvidia-drivers using the commit in the earlier termbin paste to be sure.

Will do too and report back. Just to make sure, which nvidia-drivers version do you want me to try with your patch: 550.127.05 or 565.57.01-r2? Maybe both?
Back to top
View user's profile Send private message
Ionen
Developer
Developer


Joined: 06 Dec 2018
Posts: 2849

PostPosted: Mon Nov 04, 2024 10:05 am    Post subject: Reply with quote

eeckwrk99 wrote:
Will do too and report back. Just to make sure, which nvidia-drivers version do you want me to try with your patch: 550.127.05 or 565.57.01-r2? Maybe both?
The change is identical for each so it shouldn't matter assuming that it was broken before for that version (obviously, for 565 do try without the change first).

And thanks!
Back to top
View user's profile Send private message
eeckwrk99
Apprentice
Apprentice


Joined: 14 Mar 2021
Posts: 231
Location: Gentoo forums

PostPosted: Mon Nov 04, 2024 10:57 am    Post subject: Reply with quote

I've just tried with 565.57.01-r2 after deleting the workarounds in /etc/systemd/system first and rebooting.

After 45 minutes of suspend time: same behavior. The system is completely unresponsive, I cannot switch to any TTY. Forced to reboot.

A notable difference is that unlike with 550.127.05, journalctl -b -1 doesn't show any entry after entering suspend state:

Code:
% journalctl -b -1 -r
systemd-sleep[5239]: Performing sleep operation 'suspend'...
systemd-sleep[5239]: Successfully froze unit 'user.slice'.
systemd[1]: Starting System Suspend...
systemd[1]: Reached target Sleep.
systemd[1]: nvidia-suspend.service: Consumed 375ms CPU time, 389M memory peak.
systemd[1]: Finished NVIDIA system suspend actions.
systemd[1]: nvidia-suspend.service: Deactivated successfully.


With 550.127.05, I could see usual post-resume entries although observing similar behavior (blank screen, forced to reboot).

So I hope it's not a different issue. I couldn't resume from suspend with 560 either, even with sys-apps/systemd-255.11. I'm using sys-kernel/gentoo-sources-6.6.58-r1 if that matters.

I'm going to downgrade to 550.127.05 and try the patch. Should work with this setup.
Back to top
View user's profile Send private message
Ionen
Developer
Developer


Joined: 06 Dec 2018
Posts: 2849

PostPosted: Mon Nov 04, 2024 11:15 am    Post subject: Reply with quote

I see, quite possible there's different issues.. 565 did get some fixes for sleep that affected some users but then things have a tendency to break for someone else instead (this been happening with pretty much every branches).

That aside, may want to try without nvidia.conf's fbdev=1 too, that option is still pretty experimental and can have some unexpected side effects (just remembered that it prevented resume for someone, and that's why nvidia.conf mentions it can possibly cause issues with sleep). Probably(?) unrelated to systemd's change but wouldn't be surprised if it's what been causing your issues with >=560.
Back to top
View user's profile Send private message
eeckwrk99
Apprentice
Apprentice


Joined: 14 Mar 2021
Posts: 231
Location: Gentoo forums

PostPosted: Mon Nov 04, 2024 12:02 pm    Post subject: Reply with quote

Can confirm your patch works just fine on my end with 550.127.05.

I've applied it, then emerged x11-drivers/nvidia-drivers-550.127.05-r1:

Code:
% sudo emlop l -e x11-drivers/nvidia-drivers -n 1
2024-11-04 16:12:37  1:26 x11-drivers/nvidia-drivers-550.127.05-r1


Code:
% lsd --icon=no --tree /usr/lib/systemd
├── systemd-hibernate.service.d
│   └── 10-nvidia.conf -> ../systemd-suspend.service.d/10-nvidia.conf
├── systemd-homed.service.d
│   └── 10-nvidia.conf
├── systemd-hybrid-sleep.service.d
│   └── 10-nvidia.conf -> ../systemd-suspend.service.d/10-nvidia.conf
├── systemd-suspend-then-hibernate.service.d
│   └── 10-nvidia.conf -> ../systemd-suspend.service.d/10-nvidia.conf
├── systemd-suspend.service.d
│   └── 10-nvidia.conf


After rebooting and suspending for 30 minutes, I could resume just fine. journalctl shows that SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0 is used:

Code:
systemd-sleep[5930]: User sessions remain unfrozen on explicit request ($SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=0).
systemd-sleep[5930]: This is not recommended, and might result in unexpected behavior, particularly
systemd-sleep[5930]: in suspend-then-hibernate operations or setups with encrypted home directories.
systemd-sleep[5930]: Performing sleep operation 'suspend'...


Ionen wrote:
I see, quite possible there's different issues.. 565 did get some fixes for sleep that affected some users but then things have a tendency to break for someone else instead (this been happening with pretty much every branches).

That aside, may want to try without nvidia.conf's fbdev=1 too, that option is still pretty experimental and can have some unexpected side effects (just remembered that it prevented resume for someone, and that's why nvidia.conf mentions it can possibly cause issues with sleep). Probably(?) unrelated to systemd's change but wouldn't be surprised if it's what been causing your issues with >=560.


I already tried with fbdev=0 back when I was experimenting with 560, it didn't make any difference unfortunately. Still couldn't resume.

It would be nice if some affected users who could resume just fine with 560 and systemd 255.11 could try the patch with 565.57.01-r2. The fact that I'm not seeing any entry in journalctl -b -1 really makes me think it's a different issue so I'm afraid I can't test this reliably with 565 beta.

Considering that Arch still use the drop-in workaround files for 565, I guess it must still be required.

Edit: I'll try 565 with fbdev=0 regardless, who knows. I'll let you know.
Back to top
View user's profile Send private message
Ionen
Developer
Developer


Joined: 06 Dec 2018
Posts: 2849

PostPosted: Mon Nov 04, 2024 1:26 pm    Post subject: Reply with quote

eeckwrk99 wrote:
Considering that Arch still use the drop-in workaround files for 565, I guess it must still be required.
I'd sooner assume that it hasn't been re-tested. Odds are I'll (also) leave the workaround there forever until someone or something tells me it's not needed anymore, but likely nobody will be trying that outside the short term.

Anyhow, guess I'll go ahead and push it, thanks for testing.

Edit: done in https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=380e65161ef3
Back to top
View user's profile Send private message
eeckwrk99
Apprentice
Apprentice


Joined: 14 Mar 2021
Posts: 231
Location: Gentoo forums

PostPosted: Mon Nov 04, 2024 8:39 pm    Post subject: Reply with quote

I've just tried 565.57.01-r3 with 6.6.58-r1 kernel.

I've been able to resume with both fbdev=0 and fbdev=1 (but not reliably, will give further details), which means that the now included workaround definitely helped! At least in my case since I couldn't resume at all with 565.57.01-r2 (without the workaround).

I could suspend/resume two or three times with fbdev=0, only once with fbdev=1. After the second attempt, I got a black screen (could still see post-resume entries in journalctl -b -1 though).

Plus, strange things occurred after every resume. My lock screen (xsecurelock) was laggy when typing my user password, I couldn't unlock my session. After switching to another TTY and logging as root, I noticed that picom was using a significant amount of CPU resources. Killing it and switching back to TTY1 solved the issue.

I never experienced this before, probably a separate 565 issue.

lonen wrote:
Odds are I'll (also) leave the workaround there forever until someone or something tells me it's not needed anymore, but likely nobody will be trying that outside the short term.

Maybe the situation will improve over time and either systemd or NVIDIA will properly deal with the issue. I'll keep an eye on Arch Linux nvidia-utils packages changes, maybe they'll remove the workaround at some point. I'll definitely post here if they do so. I'm also using Arch Linux so I can test things with both Arch and Gentoo and see how it goes. I guess you also follow this kind of stuff closely as a Gentoo maintainer so you might come across the possible removal even sooner than I would :)

lonen wrote:
Anyhow, guess I'll go ahead and push it, thanks for testing.
Edit: done in https://gitweb.gentoo.org/repo/gentoo.git/commit/?id=380e65161ef3

You're welcome! Thanks for adding the workaround.
Back to top
View user's profile Send private message
eeckwrk99
Apprentice
Apprentice


Joined: 14 Mar 2021
Posts: 231
Location: Gentoo forums

PostPosted: Mon Nov 04, 2024 9:09 pm    Post subject: Reply with quote

Just found out that OpenSUSE is also using the workaround
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum