View previous topic :: View next topic |
Author |
Message |
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Thu Jan 04, 2024 2:42 am Post subject: nvidia gpu questions |
|
|
I have an AMD GPU at the moment but I'm moving to NVIDIA for LLM stuff. I've read the Nvidia and nvidia-drivers pages on the gentoo wiki and had a couple questions:
1. Can I do a 'smooth' transition where I install the nvidia drivers while leaving the AMDGPU stuff alone? I'm going to take the AMD card out before booting with the NVIDIA one, but I'd like to be able to switch back to AMD if/when there is something wrong with the NVIDIA drivers / my kernel config.
2. The nvidia-drivers page mentions that periodically the kernel breaks "ABI" which renders the NVIDIA drivers non-functional until NVIDIA release an update, which usually takes about a week. The kernel recommends staying on the supported kernel until this fix is released by NVIDIA. My question is: will we know in advance that such a break will occur? Or do I have to wait for my system to break or something, and then roll back to the previous kernel?
3. The nvidia-drivers page mentions that the nvidia-drivers modules are built against the current (eselected) sources. When I switch to new sources and compile a new kernel, do I have to manually re-emerge nvidia-drivers?
4. nvidia-drivers has a USE flag "modules-sign". I already have automatic module signing enforced by the kernel every time I compile it. I assume the kernel will be smart enough to skip the nvidia-drivers modules if it sees they are signed? It seems like signing modules twice might be problematic. I assume the purpose of this USE flag is to make sure the modules will load when nvidia-drivers is emerged between kernel compiles.
I think that is it for now. Sorry it's a lot, but any advice very much appreciated as always. |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2868
|
Posted: Thu Jan 04, 2024 6:46 am Post subject: Re: nvidia gpu questions |
|
|
Quote: | 1. Can I do a 'smooth' transition where I install the nvidia drivers while leaving the AMDGPU stuff alone? I'm going to take the AMD card out before booting with the NVIDIA one, but I'd like to be able to switch back to AMD if/when there is something wrong with the NVIDIA drivers / my kernel config. | Should be fine yes, can keep multiple values in VIDEO_CARDS and kernel obviously won't attempt to use a card that's not there anymore and load nvidia-drivers instead. Assuming you're not doing anything overly custom (like a manual xorg.conf), it should all get auto-detected back&forth. Not that it's something I've done myself.
Quote: | 2. The nvidia-drivers page mentions that periodically the kernel breaks "ABI" which renders the NVIDIA drivers non-functional until NVIDIA release an update, which usually takes about a week. The kernel recommends staying on the supported kernel until this fix is released by NVIDIA. My question is: will we know in advance that such a break will occur? Or do I have to wait for my system to break or something, and then roll back to the previous kernel? | This *used* to happen somewhat often on new major kernel version (aka x.x.0 releases), but in the last 3-4 years or so nvidia been rather on top and makes fixes their (latest) drivers during the kernel -rc phase (so by the time it's out, it's already fixed). Worst case they may get hit by a last minute change they didn't account for but that's rare. Not that I recommend using .0 kernels regardless of nvidia, it's good to wait till .3-5+ to avoid early regressions (the previous branch remains supported for a while and gets new releases).
Note that this is further never an issue if you're just using gentoo stable kernels (aka 6.1.x currently), that's an LTS one and these don't break how things work on a whim.
NVIDIA has its own long-term-support (older) drivers branches that tend to lag a bit more behind and should be accompanied by a LTS kernel if used though. But unless using 10+ years old hardware or running into regressions there's typically no reason to use these older branches.
The ebuild will give a big warning if you're ever building against a kernel branch that's known broken or not tested against.
On another note, take the wiki page with a grain of salt. It can be useful but a lot of the information on it (while not necessarily wrong) is rather dated.
Quote: | 3. The nvidia-drivers page mentions that the nvidia-drivers modules are built against the current (eselected) sources. When I switch to new sources and compile a new kernel, do I have to manually re-emerge nvidia-drivers? | *If* you're building the kernel manually, then yes (emerge @module-rebuild will pull all modules including nvidia-drivers).
Another option is to use e.g. sys-kernel/gentoo-kernel (aka distribution kernels), and enable USE=dist-kernel globally. Portage will not only build the kernel by itself, but also trigger a rebuild for nvidia-drivers right after. This all happens as part of your normal @world updates, can even set it up to update grub or similar for you.
Note that having modules loaded that mismatch the current user space nvidia libraries will result in failing to start gpu-accelerated applications (e.g. can happen if boot or build against the wrong kernel, and old modules get used -- or just updated nvidia-drivers and haven't rebooted yet which the ebuild will warn about in postinst if mismatching). May want to keep an eye on this package if you don't want to either reboot right away, or close everything that use the drivers so can unload+reload modules.
Quote: | 4. nvidia-drivers has a USE flag "modules-sign". I already have automatic module signing enforced by the kernel every time I compile it. I assume the kernel will be smart enough to skip the nvidia-drivers modules if it sees they are signed? It seems like signing modules twice might be problematic. I assume the purpose of this USE flag is to make sure the modules will load when nvidia-drivers is emerged between kernel compiles. | The kernel won't handle out-of-tree modules automatically as it's managed right now regardless of that option. If you are using signing, just enable the USE and they'll only be signed once (unless you're doing some wonky non-standard stuff with custom scripts that is). |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Thu Jan 04, 2024 1:31 pm Post subject: |
|
|
Thanks! Extremely helpful. This is definitely enough to get me started.
Quote: | This *used* to happen somewhat often on new major kernel version (aka x.x.0 releases), but in the last 3-4 years or so nvidia been rather on top and makes fixes their (latest) drivers during the kernel -rc phase (so by the time it's out, it's already fixed). Worst case they may get hit by a last minute change they didn't account for but that's rare. Not that I recommend using .0 kernels regardless of nvidia, it's good to wait till .3-5+ to avoid early regressions (the previous branch remains supported for a while and gets new releases). |
Ah ok, so these "breaking" changes would happen during "minor revisions" (the 'y' in 'x.y.z'), and the trick would be to wait until x.y.1 or something. Someone in the IRC once advised me to not update to x.y.0 and wait for .1 or .2 and that's been my practice for the last 2 years. Nice to know that it's a non-issue nowadays.
Quote: | Note that having modules loaded that mismatch the current user space nvidia libraries will result in failing to start gpu-accelerated applications (e.g. can happen if boot or build against the wrong kernel, and old modules get used -- or just updated nvidia-drivers and haven't rebooted yet which the ebuild will warn about in postinst if mismatching). May want to keep an eye on this package if you don't want to either reboot right away, or close everything that use the drivers so can unload+reload modules. |
Hm, ok. I wonder if I can have portage notify me that an update is available without actually updating it. Edit: I haven't been able to find a way of doing this. I was hoping I could add an nvidia-drivers entry to package.mask and then unmask as needed, but if I do this I just get a warning about it being masked on every attempted @world update.
And correct me if I'm wrong, but it seems like using dist-kernel would not avoid the issue of having to reboot after rebuilding the modules: it just automates the rebuild.
Thanks again! |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2868
|
Posted: Thu Jan 04, 2024 3:34 pm Post subject: |
|
|
It's not a notification, but you can --exclude nvidia-drivers if you want to skip the update without masking (kernel updates could be skipped that way as well). Better suited given masks tend to get forgotten and you may find yourself never updating and missing out on CVE fixes for nvidia drivers.
It would alternatively be an option to --exclude it by default and only update when you update your kernel, assuming do kernel update regularly that wouldn't leave it to rot for too long. |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Thu Jan 04, 2024 5:04 pm Post subject: |
|
|
Quote: | It's not a notification, but you can --exclude nvidia-drivers if you want to skip the update without masking (kernel updates could be skipped that way as well). Better suited given masks tend to get forgotten and you may find yourself never updating and missing out on CVE fixes for nvidia drivers. |
I think I'll do that. Cheers, thanks again |
|
Back to top |
|
|
Gentoopc Guru
Joined: 25 Dec 2017 Posts: 364
|
Posted: Fri Jan 05, 2024 3:11 am Post subject: Re: nvidia gpu questions |
|
|
sunox wrote: | I have an AMD GPU at the moment but I'm moving to NVIDIA for LLM stuff. |
have you already switched to nvidia? if you already have a switch to this video card, then please tell me what are your impressions? |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Fri Jan 05, 2024 4:42 pm Post subject: |
|
|
Yes I'm using a 3090 now with 535 drivers (stable in Gentoo).
My early impressions are that nvidia support on linux is pretty good. I'm using wayland (hyprland in particular), which nvidia still has some issues with, and so far my desktop performance isn't quite what it was with the amd card even though the nvidia card is far more powerful, but I'm still hopeful that I can overcome this with some config changes. The only real issue I'm experiencing is some lag in window animations, and from what I understand this is not something commonly experienced by other users of Hyprland with nvidia cards. There is no hardware cursor support for wlroots-based compositors at the moment which is also a mild annoyance.
From what I've heard wayland performance is going to be really good on KDE Plasma 6 though (which imo bodes well for the future of wayland/nvidia), and I think X11 support is basically perfect, so if you use either of those you should be quite happy.
I bought it to experiment with open source AI tools and I haven't had a chance to do any of that yet. I'm not much of a gamer but I did get Cyberpunk to play without much effort.
Hope that is helpful. Let me know if there's anything in particular you'd like to know and I'll do my best to answer. |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2868
|
Posted: Fri Jan 05, 2024 6:45 pm Post subject: |
|
|
If using wayland, may want to try nvidia-drivers-545 (aka the non-production branch in ~testing), may have some unexpected issues but it has several wayland improvements.
I still been sticking to Xorg with nvidia myself though, I'd rather leave things to mature for a few years (the GBM support is notably still somewhat recent, previously wayland support on nvidia was with EGLStream but that didn't gain much adoption outside Plasma and Gnome which try to keep things usable rather than have --my-next-gpu-wont-be-nvidia). |
|
Back to top |
|
|
sunox Tux's lil' helper
Joined: 26 Jan 2022 Posts: 147
|
Posted: Fri Jan 05, 2024 6:58 pm Post subject: |
|
|
Thanks for the suggestion. I actually thought about switching back to i3 and I might end up doing that yet. I like wayland for a few reasons but I'm pretty intolerant of arguably minor stuff like lag/stutter so it might be time to go back. I'm really happy wit the card itself, so much so that doing what previously seemed unthinkable - moving back to X11 - suddenly sounds ok.
I will give 545 a try first though. I hear there are numerous issues, but most of them seem to relate to gaming which I don't really care about. |
|
Back to top |
|
|
|