View previous topic :: View next topic |
Author |
Message |
andi456 Apprentice
Joined: 06 Mar 2005 Posts: 225 Location: Germany
|
Posted: Fri Sep 20, 2024 7:38 pm Post subject: [Solved] nvidia-drivers and suspend to RAM |
|
|
Hi,
my habit using sys-power/hibernate-script to suspend my main machine to RAM after a while in order to not let the computer run without reason, cannot be kept up, if I do not change some configuration, but I don't know exactly how
When issuing suspend to ram, the machine immediately returns back to power on and dmesg tells me the following about the reason why:
Code: |
NVRM: GPU 0000:06:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
nvidia 0000:06:00.0: PM: pci_pm_suspend(): nvidia_isr_kthread_bh+0x560/0x810 [nvidia] returns -5
nvidia 0000:06:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns -5
nvidia 0000:06:00.0: PM: failed to suspend: error -5
PM: Some devices failed to suspend, or early wake event detected
|
As far as I understand the README by nvidia, I've been lucky that s2ram has been working for so long, and now I have to somehow make suspend work via elogind, as I don't use systemd. Is there any documentation for that?
Kind regards.
Last edited by andi456 on Sat Sep 21, 2024 9:43 am; edited 1 time in total |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2891
|
Posted: Fri Sep 20, 2024 9:17 pm Post subject: |
|
|
The recent nvidia-drivers revbumps (-r1) had a one-time warning that you may have missed:
Code: | if [[ ${REPLACING_VERSIONS##* } ]] &&
ver_test ${REPLACING_VERSIONS##* } -lt 560.35.03-r1 # may get repeated
then
elog
elog "For suspend/sleep, 'NVreg_PreserveVideoMemoryAllocations=1' is now default"
elog "with this version of ${PN}. This is recommended (or required) by"
elog "major DEs especially with wayland but, *if* experience regressions with"
elog "suspend, try reverting to =0 in '${EROOT}/etc/modprobe.d/nvidia.conf'."
elog
elog "May notably be an issue when using neither systemd nor elogind to suspend."
elog
elog "Also, the systemd suspend/hibernate/resume services are now enabled by"
elog "default, and for openrc+elogind a similar hook has been installed."
fi | So you if you want to return to the previous behaviour and everything was fine "for you", then you can edit nvidia.conf to return to =0 then reboot.
As for elogind, `loginctl suspend` should work in theory (same as systemd). Not something I ever tried and most users been testing this using their DE's suspend/sleep button instead.
Alternatively, not familiar with hibernate-script, but in theory anything can be used if you can just make them call /usr/bin/nvidia-sleep.sh (takes suspend, hibernate, or resume as arguments) to do the same thing that elogind/systemd would be doing. I imagine calling sleep.sh resume may be difficult without some kind of "on resume" hooks w/ a daemon though.
Switching to =1 would be esp. needed if you ever plan to use wayland as resume otherwise (often) resullts in graphical corruption. Both Gnome and Plasma also recommend that this is set, which is why this is the new default. May matter less for simple setups. |
|
Back to top |
|
|
andi456 Apprentice
Joined: 06 Mar 2005 Posts: 225 Location: Germany
|
Posted: Sat Sep 21, 2024 9:42 am Post subject: |
|
|
Thanks for your quick answer. Indeed, I must have overlooked the new default in /etc/modprobe.d/nvidia.conf
I've set PreserveVideoMemoryAllocations back to 0 for now and will probably switch to the eloginctl method in the future, as the elogind-daemon is running anyway, although i use awesome wm. |
|
Back to top |
|
|
Frautoincnam Guru
Joined: 19 May 2017 Posts: 331
|
Posted: Sun Sep 22, 2024 5:29 pm Post subject: |
|
|
I had the same problem although using elogind and loginctl suspend/hibernate (openrc and X11).
I had to set NVreg_PreserveVideoMemoryAllocations=0 to get a display back when waking up from hibernation, but entering suspend/hibernate and waking up from it are much slower than before. I hope a future update will fix the problem, but I won't know when to go back to NVreg_PreserveVideoMemoryAllocations=1. |
|
Back to top |
|
|
andi456 Apprentice
Joined: 06 Mar 2005 Posts: 225 Location: Germany
|
Posted: Mon Sep 23, 2024 3:50 pm Post subject: |
|
|
The wake up process is quite slow with NVreg_PreserveVideoMemoryAllocations=0
Well, one has to try from time to time to set it to 1 or so it seems in order to be able to try out wayland, for example. The latter also requires "options nvidia-drm modeset=0" which completely freezes my system on boot. |
|
Back to top |
|
|
Frautoincnam Guru
Joined: 19 May 2017 Posts: 331
|
Posted: Mon Sep 23, 2024 5:00 pm Post subject: |
|
|
andi456 wrote: | The wake up process is quite slow with NVreg_PreserveVideoMemoryAllocations=0 |
I found that I can save 5sec on suspend/hibernate with InhibitDelayMaxSec=0
Quote: | Well, one has to try from time to time to set it to 1 |
Yes of course.
Quote: | or so it seems in order to be able to try out wayland, for example. |
I have 3 monitors and the configuration with wayland was always a disaster each time I tried. I prefer not.
Quote: | The latter also requires "options nvidia-drm modeset=0" which completely freezes my system on boot. |
Good to know. I'll try it with NVreg_PreserveVideoMemoryAllocations=1 to see.
Thanks. |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2891
|
Posted: Mon Sep 23, 2024 7:30 pm Post subject: |
|
|
andi456 wrote: | The wake up process is quite slow with NVreg_PreserveVideoMemoryAllocations=0
Well, one has to try from time to time to set it to 1 or so it seems in order to be able to try out wayland, for example. The latter also requires "options nvidia-drm modeset=0" which completely freezes my system on boot. | There could be something strange going on but... everything here sounds.. backward?
1. NVreg_PreserveVideoMemoryAllocations=1 (not =0), can be the one that is slower because (afaik) it saves+reload memory allocations to disk, and in the event of a slow device on /var/tmp it can be noticeable. Note that the path can be changed in nvidia.conf (to a faster device), tmpfs is also a potential option given enough space (not that this makes much sense for hibernation given tmpfs' ram will be saved to disk).
2. The latter requires modeset=0...? Does that refer to wayland? Because wayland *requires* modeset=1, nvidia-drivers also sets =1 by default with USE=wayland. Afaik there are not a whole lot of things that don't work with =1 nowadays, and there is little reason to use =0. The nvidia upstream default of =0 with USE=-wayland is only kept not to cause surprises given it's what it has been for decades and because NVIDIA docs still call it "experimental".. but at this point about all major distros default to =1 for wayland. It'd probably make sense to do modeset=1 regardles of USE=wayland like Allocations=1 at this point. |
|
Back to top |
|
|
andi456 Apprentice
Joined: 06 Mar 2005 Posts: 225 Location: Germany
|
Posted: Mon Sep 23, 2024 10:34 pm Post subject: |
|
|
Sorry for the confusion. I wanted to say that "options nvidia-drm modeset=1" needs to be set in order to get wayland to work according to the nvidia.conf file in /etc/modprobe.d/. But that freezes my system. I don't know why. |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2891
|
Posted: Mon Sep 23, 2024 10:37 pm Post subject: |
|
|
andi456 wrote: | Sorry for the confusion. I wanted to say that "options nvidia-drm modeset=1" needs to be set in order to get wayland to work according to the nvidia.conf file in /etc/modprobe.d/. But that freezes my system. I don't know why. | I see, no real idea what could cause this either. I know =1 can cause problems with SLI and Reverse Prime but you'd know that already given it's in nvidia.conf, so probably something else. |
|
Back to top |
|
|
andi456 Apprentice
Joined: 06 Mar 2005 Posts: 225 Location: Germany
|
Posted: Wed Sep 25, 2024 8:45 am Post subject: |
|
|
Finally, I figured out how to have both
Code: | options nvidia-drm modeset=1 |
and
Code: | NVreg_PreserveVideoMemoryAllocations=1 |
enabled in /etc/modprobe.d/nvidia.conf and a working suspend mechanism on my system. Two steps seem to have been necessary.
1. switching to loginctl suspend and
2. putting the kernel in an initramfs including the nvidia modules (nvidia, nvidia_drm, nvidia_modeset). (I used the dracut tool for that, but there are other methods too.)
The initramfs presumbaly prevents the boot process from freezing, because it loads the nvidia modules at an ealier stage. |
|
Back to top |
|
|
eeckwrk99 Apprentice
Joined: 14 Mar 2021 Posts: 242 Location: Gentoo forums
|
Posted: Wed Sep 25, 2024 9:19 am Post subject: |
|
|
andi456 wrote: | Finally, I figured out how to have both
Code: | options nvidia-drm modeset=1 |
and
Code: | NVreg_PreserveVideoMemoryAllocations=1 |
enabled in /etc/modprobe.d/nvidia.conf and a working suspend mechanism on my system. Two steps seem to have been necessary.
1. switching to loginctl suspend and
2. putting the kernel in an initramfs including the nvidia modules (nvidia, nvidia_drm, nvidia_modeset). (I used the dracut tool for that, but there are other methods too.)
The initramfs presumbaly prevents the boot process from freezing, because it loads the nvidia modules at an ealier stage. |
I'm using
- sys-kernel/gentoo-sources-6.1.111
- x11-drivers/nvidia-drivers-535.183.01-r1
- sys-apps/systemd-255.11
- X11
I've just tried with:
- /etc/modprobe.b/nvidia.conf
Code: | blacklist nouveau
options nvidia-drm modeset=1
options nvidia \
NVreg_PreserveVideoMemoryAllocations=1 \
NVreg_TemporaryFilePath=/var/tmp
options nvidia \
NVreg_DeviceFileGID=27 \
NVreg_DeviceFileMode=432 \
NVreg_DeviceFileUID=0 \
NVreg_ModifyDeviceFiles=1
alias char-major-195 nvidia
alias /dev/nvidiactl char-major-195
remove nvidia modprobe -r --ignore-remove nvidia-drm nvidia-modeset nvidia-uvm nvidia |
- /etc/dracut.conf.d/options.conf
Code: | add_dracutmodules+=" crypt dm lvm rootfs-block systemd "
add_drivers+=" nvidia nvidia-drm nvidia-modeset "
filesystems+=" ext4 "
hostonly="yes"
compress="zstd" |
When suspending with systemctl suspend, the system doesn't enter suspend state. Getting the same errors you mentioned in OP:
Code: | $ journalctl -b
Sep 25 17:05:20 gentoo-desktop kernel: NVRM: GPU 0000:03:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
Sep 25 17:05:20 gentoo-desktop kernel: nvidia 0000:03:00.0: PM: pci_pm_suspend(): nvidia_isr_kthread_bh+0x520/0x760 [nvidia] returns -5
Sep 25 17:05:20 gentoo-desktop kernel: nvidia 0000:03:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
Sep 25 17:05:20 gentoo-desktop kernel: nvidia 0000:03:00.0: PM: failed to suspend async: error -5
Sep 25 17:05:20 gentoo-desktop kernel: PM: Some devices failed to suspend, or early wake event detected
Sep 25 17:05:21 gentoo-desktop systemd-sleep[5248]: Failed to put system to sleep. System resumed again: Input/output error
Sep 25 17:05:21 gentoo-desktop systemd[1]: systemd-suspend.service: Main process exited, code=exited, status=1/FAILURE
Sep 25 17:05:21 gentoo-desktop systemd[1]: systemd-suspend.service: Failed with result 'exit-code'.
Sep 25 17:05:21 gentoo-desktop systemd[1]: Failed to start System Suspend.
Sep 25 17:05:21 gentoo-desktop systemd[1]: Dependency failed for Suspend. |
Any suggestion? I've never been able to suspend with NVreg_PreserveVideoMemoryAllocations=1 so far.
Edit: Silly me, forgot to enable nvidia-{resume,suspend,hibernate}.service units again. Will report back if that makes any difference. |
|
Back to top |
|
|
eeckwrk99 Apprentice
Joined: 14 Mar 2021 Posts: 242 Location: Gentoo forums
|
Posted: Wed Sep 25, 2024 10:07 am Post subject: |
|
|
OK, works fine now after enabling nvidia-{resume,suspend,hibernate}.service units again. I had to disable
Code: | xss-lock --transfer-sleep-lock -- physlock & | in my AwesomeWM autostart script, else when suspending physlock would lock my session and the system wouldn't enter suspend state until I enter my username password again. Then, the machine would resume on TTY2 for some reason, I'd had to switch back to TTY1.
I'll have to test with 6.6 kernel and NVIDIA 550 though.
Edit: Not working with 6.6 kernel and NVIDIA 535. Still getting a black screen after resuming. Will try 6.6 + NVIDIA 550.
After this, I've been able to suspend/resume four times in a row just fine, with one exception. If I suspend while having mpv playing a video (in paused state before suspending, of course) and also using hardware video decoding with in ~/.config/mpv/mpv.conf, then the machine won't enter suspend state. I'm getting the following errors:
Code: |
Sep 25 17:51:24 gentoo-desktop systemd-sleep[8525]: Performing sleep operation 'suspend'...
Sep 25 17:51:24 gentoo-desktop kernel: PM: suspend entry (deep)
Sep 25 17:51:24 gentoo-desktop kernel: Filesystems sync: 0.037 seconds
Sep 25 17:51:25 gentoo-desktop kernel: Freezing user space processes
Sep 25 17:51:25 gentoo-desktop kernel: Freezing user space processes completed (elapsed 0.001 seconds)
Sep 25 17:51:25 gentoo-desktop kernel: OOM killer disabled.
Sep 25 17:51:25 gentoo-desktop kernel: Freezing remaining freezable tasks
Sep 25 17:51:25 gentoo-desktop kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
Sep 25 17:51:24 gentoo-desktop kernel: uvm_suspend_entry+0x3e/0x220 [nvidia_uvm]
Sep 25 17:51:24 gentoo-desktop kernel: ? down+0x1a/0x60
Sep 25 17:51:24 gentoo-desktop kernel: nv_uvm_suspend+0x2d/0x50 [nvidia]
Sep 25 17:51:24 gentoo-desktop kernel: nv_set_system_power_state+0x3b7/0x470 [nvidia]
Sep 25 17:51:24 gentoo-desktop kernel: nv_teardown_pat_support+0x443/0x1b00 [nvidia]
Sep 25 17:51:24 gentoo-desktop kernel: proc_reg_write+0x57/0xa0
Sep 25 17:51:24 gentoo-desktop kernel: vfs_write+0xbb/0x390
Sep 25 17:51:24 gentoo-desktop kernel: ? handle_mm_fault+0xee/0x2e0
Sep 25 17:51:24 gentoo-desktop kernel: ksys_write+0x5c/0xe0
Sep 25 17:51:24 gentoo-desktop kernel: do_syscall_64+0x35/0x80
Sep 25 17:51:24 gentoo-desktop kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Sep 25 17:51:24 gentoo-desktop kernel: RIP: 0033:0x7f59957fc5c4
Sep 25 17:51:24 gentoo-desktop kernel: Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 85 2a 0e 00 00 74 13 b8 01 >
Sep 25 17:51:24 gentoo-desktop kernel: RSP: 002b:00007ffcad44eca8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Sep 25 17:51:24 gentoo-desktop kernel: RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f59957fc5c4
Sep 25 17:51:24 gentoo-desktop kernel: RDX: 0000000000000008 RSI: 000055b0fe84bcc0 RDI: 0000000000000001
Sep 25 17:51:24 gentoo-desktop kernel: RBP: 00007f59958d85c0 R08: 00007f59958d7ac0 R09: 0000000000000004
Sep 25 17:51:24 gentoo-desktop kernel: R10: 0000000000000001 R17: 0000000000000202 R12: 0000000000000008
Sep 25 17:51:24 gentoo-desktop kernel: R13: 000055b0fe84bcc0 R14: 0000000000000008 R15: 00007f59958d5f00
Sep 25 17:51:24 gentoo-desktop kernel: </TASK>
Sep 25 17:51:24 gentoo-desktop kernel: Modules linked in: nvidia_uvm(PO)>
Sep 25 17:51:24 gentoo-desktop kernel: nvidia(PO) drm_kms_helper crct10dif_pclmul syscopyarea ghash_clmulni_intel sysfillrect sysimgblt sha512_ssse3 fb_sys>
Sep 25 17:51:24 gentoo-desktop kernel: CR2: 0000000000000000
Sep 25 17:51:24 gentoo-desktop kernel: ---[ end trace 0000000000000000 ]---
Sep 25 17:51:24 gentoo-desktop kernel: RIP: 0010:nvstatusToString+0x215/0x250 [nvidia_uvm]
Sep 25 17:51:24 gentoo-desktop kernel: Code: 4c 89 64 24 18 e8 2b bc cd cb 48 89 c6 48 8b 04 24 4c 39 e8 75 3e 48 8b 53 08 48 89 1c 24 4c 89 f7 48 89 43 08 >
Sep 25 17:51:24 gentoo-desktop kernel: RSP: 0018:ffffb95b05217d00 EFLAGS: 00010046
Sep 25 17:51:24 gentoo-desktop kernel: RAX: ffffb95b05217d00 RBX: ffffb95b03d462a8 RCX: 0000000000000001
Sep 25 17:51:24 gentoo-desktop kernel: RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffffb95b03d462b8
Sep 25 17:51:24 gentoo-desktop kernel: RBP: ffffb95b05217d68 R08: 0000000000000001 R09: 0000000000000000
Sep 25 17:51:24 gentoo-desktop kernel: R10: 0000000000000000 R17: 0000000000000009 R12: ffffb95b05217d20
Sep 25 17:51:24 gentoo-desktop kernel: R13: ffffb95b05217d00 R14: ffffb95b03d462b8 R15: ffff9fed548f0000
Sep 25 17:51:24 gentoo-desktop kernel: FS: 00007f59956c6b80(0000) GS:ffff9ff4ffc40000(0000) knlGS:0000000000000000
Sep 25 17:51:24 gentoo-desktop kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 25 17:51:24 gentoo-desktop kernel: CR2: 0000000000000000 CR3: 0000000106024006 CR4: 00000000001706e0
Sep 25 17:51:24 gentoo-desktop kernel: note: nvidia-sleep.sh[8479] exited with irqs disabled
Sep 25 17:51:24 gentoo-desktop kernel: note: nvidia-sleep.sh[8479] exited with preempt_count 1
|
Suspend/resume works if removing hwdec=auto. Maybe I should create a separate topic for this? |
|
Back to top |
|
|
hujuice Guru
Joined: 16 Oct 2007 Posts: 345 Location: Nicosia, Cyprus
|
Posted: Sat Jan 11, 2025 8:19 am Post subject: |
|
|
I am stuck in the same "resume from hibernate" nvidia issue and I can't resolve.
My poor knowledge of graphical stuff probably doesn't help.
Hibernation works regularly, but resuming leads to a black screen (both graphic and text), while the rest is working (ssh works, all services are running). I find in dmesg a not so useful error:
dmesg: | nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0:417 |
If I stop display-manager and then I hibernate, the system resumes regularly, but when I restart the graphical environment it appears messed-up (one or both screens are black, with mouse pointer).
I tried what you mentioned above
/etc/modprobe.d/nvidia.conf: | NVreg_PreserveVideoMemoryAllocations=0 |
/etc/dracut.conf.d/modules.conf: | add_drivers+=" nvidia nvidia-drm nvidia-modeset nvidia-uvm " |
without changes.
Kernel is 6.6.67, nvidia-drivers are 550.142 and I run plasma6 (same problem with wayland or X11).
Is there anything you can kindly suggest to try to resolve?
Regards,
HUjuice _________________ Who hasn't a spine, should have a method.
Chi non ha carattere, deve pur avere un metodo. |
|
Back to top |
|
|
andi456 Apprentice
Joined: 06 Mar 2005 Posts: 225 Location: Germany
|
Posted: Thu Jan 23, 2025 2:46 pm Post subject: |
|
|
Sorry, the only thing, that I could think of would be to try to hibernate the system from the command line with loginctl suspend, to see what happens...
Kind regards. |
|
Back to top |
|
|
hujuice Guru
Joined: 16 Oct 2007 Posts: 345 Location: Nicosia, Cyprus
|
Posted: Thu Jan 23, 2025 9:07 pm Post subject: |
|
|
andi456 wrote: | try to hibernate the system from the command line with loginctl suspend, to see what happens... |
Same result. To have a successful hibernation/resume cycle, I have to remove nvidia-drm
Code: | modprobe -r nvidia-drm |
It means to stop the display manager before, so it doesn't have sense for a desktop.
If I do it, anyway, everything works fine. After the resume I can re-load nvidia-drm and restart the display manager.
So, what's wrong with nvidia-drm?
I tried different recent versions of nvidia-drivers. Kernel is 6.6.7.
HUjuice _________________ Who hasn't a spine, should have a method.
Chi non ha carattere, deve pur avere un metodo. |
|
Back to top |
|
|
|