View previous topic :: View next topic |
Author |
Message |
SkunkMyrddyn n00b
Joined: 25 Dec 2024 Posts: 5
|
Posted: Wed Dec 25, 2024 4:04 am Post subject: Nvidia Datacenter Driver |
|
|
I'm adding a Nvidia Tesla A2 card to my server to support Cuda / Tensor flow / other AI and compute node acceleration. (the card does not have video out connections)
I am having a difficult time installing the correct driver for the system. The general "nvidia-drivers" package 1) requires X (this is a headless server), and 2) does not list this card as supported (if I'm reading the documentation correctly).
Does anyone know how to get the correct driver(s) installed so that pytorch can recognize the nvidia compute nodes for acceleration? |
|
Back to top |
|
|
tiffany n00b
Joined: 04 May 2008 Posts: 11
|
Posted: Wed Dec 25, 2024 9:57 am Post subject: |
|
|
NVidia's site has a separate section for datacenter drivers. Have you seen them?
I see that they support RHEL, Debian and others. |
|
Back to top |
|
|
SkunkMyrddyn n00b
Joined: 25 Dec 2024 Posts: 5
|
Posted: Wed Dec 25, 2024 4:38 pm Post subject: |
|
|
I checked those out and wasn't sure how to convince gentoo to handle one of the other packaging formats. So I did grab the tarballs they have, which have a nvidia-installer binary; but I can't get that to run either.
I found that it has a --no-x-check that will bypass seeing if X (of some kind) is installed or not.
However, the installer errors out saying it cannot figure out my initramfs. Which makes sense as I am not using an initramfs at all on this system. Nor do I see an option to inform that installer to bypass it.
I feel like I'm missing something basic. |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1803 Location: Germany
|
|
Back to top |
|
|
SkunkMyrddyn n00b
Joined: 25 Dec 2024 Posts: 5
|
Posted: Thu Dec 26, 2024 10:11 am Post subject: |
|
|
The nvidia-cuda-toolkit doesn't install a driver, so python torch does not find any cuda devices.
With -X set as a USE flag blocks the x11-drivers/nvidia-drivers from installing. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22877
|
Posted: Thu Dec 26, 2024 12:02 pm Post subject: |
|
|
SkunkMyrddyn wrote: | With -X set as a USE flag blocks the x11-drivers/nvidia-drivers from installing. | Please show the output that led to this statement. I do not see that result here: Code: | # USE=-X emerge -pv nvidia-drivers
These are the packages that would be merged, in order:
Calculating dependencies... done!
Dependency resolution took 2.59 s (backtrack: 0/20).
...
[ebuild N ] x11-drivers/nvidia-drivers-550.135:0/550::gentoo USE="modules strip tools -X -dist-kernel -kernel-open -modules-compress -modules-sign -persistenced -powerd -static-libs -wayland" ABI_X86="(64) -32" 314787 KiB
|
|
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2885
|
Posted: Thu Dec 26, 2024 5:00 pm Post subject: |
|
|
For nvidia-drivers on a headless setup, usually you'll want USE="persistenced -X -static-libs -wayland -tools" on it (and enable persistenced w/ systemd or openrc, this prevent the card from getting uninitialized when there isn't a display constantly using it).
wrt USE=-tools, that's for nvidia-settings which is a GUI application, so likely don't want that either. It does have some command line usage but is very limited without X given it uses it to talk to the card (imagine nvidia plans to migrate its feature to rely on NVML in the future).
As for USE=-static-libs, that's for libXNVCtrl.a which requires xorg headers at build time. Library is not useful if not using X. If another package depends on nvidia-drivers having static-libs enabled, may want to try USE=-video_cards_nvidia on that package, the feature won't be useful headless.
Should let you avoid about all X/wayland stuff, albeit I wouldn't overly stress about these even if unused, it's pretty small dependencies as long as don't start pulling the bigger GUI toolkits. |
|
Back to top |
|
|
SkunkMyrddyn n00b
Joined: 25 Dec 2024 Posts: 5
|
Posted: Thu Dec 26, 2024 6:19 pm Post subject: |
|
|
Hu wrote: | SkunkMyrddyn wrote: | With -X set as a USE flag blocks the x11-drivers/nvidia-drivers from installing. | Please show the output that led to this statement. I do not see that result here: Code: | # USE=-X emerge -pv nvidia-drivers
These are the packages that would be merged, in order:
Calculating dependencies... done!
Dependency resolution took 2.59 s (backtrack: 0/20).
...
[ebuild N ] x11-drivers/nvidia-drivers-550.135:0/550::gentoo USE="modules strip tools -X -dist-kernel -kernel-open -modules-compress -modules-sign -persistenced -powerd -static-libs -wayland" ABI_X86="(64) -32" 314787 KiB
|
|
USE=-X emerge -pv nvidia-drivers
These are the packages that would be merged, in order:
Calculating dependencies... done!
Dependency resolution took 8.09 s (backtrack: 0/20).
[ebuild N ] x11-themes/hicolor-icon-theme-0.17::gentoo 0 KiB
[ebuild N ] x11-libs/libXv-1.0.13::gentoo USE="-doc" ABI_X86="(64) -32 (-x 32)" 275 KiB
[ebuild N ] x11-libs/libXcomposite-0.4.6::gentoo USE="-doc" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] x11-libs/libXcursor-1.2.3::gentoo USE="-doc" ABI_X86="(64) -32 (-x32)" 286 KiB
[ebuild N ] x11-libs/libXdamage-1.1.6::gentoo ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] dev-libs/jansson-2.14-r2:0/4::gentoo USE="-doc -static-libs" 0 KiB
[ebuild N ] dev-util/gdbus-codegen-2.82.4::gentoo PYTHON_SINGLE_TARGET="py thon3_12 -python3_10 -python3_11 -python3_13" 0 KiB
[ebuild N ] dev-lang/vala-0.56.17:0.56::gentoo USE="-test -valadoc" 0 KiB
[ebuild N ] virtual/linux-sources-3-r8::gentoo USE="-firmware" 0 KiB
[ebuild N ] x11-libs/gdk-pixbuf-2.42.12:2::gentoo USE="gif introspection j peg -gtk-doc -test -tiff" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] sys-apps/dbus-1.15.8::gentoo USE="-X -debug -doc -elogind (-se linux) -static-libs -systemd -test -valgrind" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] dev-libs/fribidi-1.0.13::gentoo USE="-doc -test" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] x11-libs/libvdpau-1.5::gentoo USE="-doc -dri -test" ABI_X86="( 64) -32 (-x32)" 0 KiB
[ebuild N ] media-libs/libepoxy-1.5.10-r3::gentoo USE="X -test" ABI_X86="( 64) -32 (-x32)" 0 KiB
[ebuild R ] x11-libs/cairo-1.18.2-r1::gentoo USE="X* glib (-aqua) (-debug) -gtk-doc -test" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] x11-libs/pango-1.52.2::gentoo USE="introspection -X -debug -sy sprof -test" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] app-accessibility/at-spi2-core-2.52.0:2::gentoo USE="introspec tion -X -dbus-broker -gtk-doc -systemd -test" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] dev-util/gtk-update-icon-cache-3.24.42::gentoo 0 KiB
[ebuild N ] gnome-base/librsvg-2.58.5:2::gentoo USE="introspection vala -d ebug -gtk-doc" ABI_X86="(64) -32 (-x32)" 6246 KiB
[ebuild N ] x11-libs/gtk+-3.24.42-r1:3::gentoo USE="X introspection (-aqua ) -broadway -cloudproviders -colord -cups -examples -gtk-doc -sysprof -test -vim -syntax -wayland -xinerama" ABI_X86="(64) -32 (-x32)" 0 KiB
[ebuild N ] x11-themes/adwaita-icon-theme-legacy-46.2::gentoo 0 KiB
[ebuild N ] x11-themes/adwaita-icon-theme-46.2::gentoo USE="-branding" 0 K iB
[ebuild N ] dev-util/vulkan-headers-1.3.296.0::gentoo 0 KiB
[ebuild N ] dev-util/pahole-1.27-r1::gentoo USE="-debug -verify-sig" PYTHO N_SINGLE_TARGET="python3_12 -python3_10 -python3_11 -python3_13" 0 KiB
[ebuild N ] x11-drivers/nvidia-drivers-565.77:0/565::gentoo USE="modules s tatic-libs strip tools -X -dist-kernel -kernel-open -modules-compress -modules-s ign -persistenced -powerd -wayland" ABI_X86="(64) -32" 347766 KiB
Total: 25 packages (24 new, 1 reinstall), Size of downloads: 354572 KiB
The following USE changes are necessary to proceed:
(see "package.use" in the portage(5) man page for more details)
# required by x11-drivers/nvidia-drivers-565.77::gentoo[tools]
# required by nvidia-drivers (argument)
>=x11-libs/gtk+-3.24.42-r1 X
# required by x11-libs/gtk+-3.24.42-r1::gentoo
# required by x11-themes/adwaita-icon-theme-legacy-46.2::gentoo
# required by x11-themes/adwaita-icon-theme-46.2::gentoo
>=media-libs/libepoxy-1.5.10-r3 X
# required by x11-libs/gtk+-3.24.42-r1::gentoo
# required by x11-themes/adwaita-icon-theme-legacy-46.2::gentoo
# required by x11-themes/adwaita-icon-theme-46.2::gentoo
>=x11-libs/cairo-1.18.2-r1 X
emerge: there are no ebuilds built with USE flags to satisfy "x11-libs/gtk+:3[X] ".
!!! One of the following packages is required to complete your request:
- x11-libs/gtk+-3.24.41-r1::gentoo (Change USE: +X)
(dependency required by "x11-drivers/nvidia-drivers-565.77::gentoo[tools]" [ebui ld])
(dependency required by "nvidia-drivers" [argument])
[Administrator edit: unchecked Disable BBCode in this post so that OP's quote tags work. -Hu] |
|
Back to top |
|
|
SkunkMyrddyn n00b
Joined: 25 Dec 2024 Posts: 5
|
Posted: Thu Dec 26, 2024 6:21 pm Post subject: |
|
|
Ionen wrote: | For nvidia-drivers on a headless setup, usually you'll want USE="persistenced -X -static-libs -wayland -tools" on it (and enable persistenced w/ systemd or openrc, this prevent the card from getting uninitialized when there isn't a display constantly using it).
wrt USE=-tools, that's for nvidia-settings which is a GUI application, so likely don't want that either. It does have some command line usage but is very limited without X given it uses it to talk to the card (imagine nvidia plans to migrate its feature to rely on NVML in the future).
As for USE=-static-libs, that's for libXNVCtrl.a which requires xorg headers at build time. Library is not useful if not using X. If another package depends on nvidia-drivers having static-libs enabled, may want to try USE=-video_cards_nvidia on that package, the feature won't be useful headless.
Should let you avoid about all X/wayland stuff, albeit I wouldn't overly stress about these even if unused, it's pretty small dependencies as long as don't start pulling the bigger GUI toolkits. |
USE="persistenced -X -static-libs -wayland -tools" emerge -pv nvidia-drivers
These are the packages that would be merged, in order:
Calculating dependencies... done!
Dependency resolution took 2.86 s (backtrack: 0/20).
[ebuild N ] acct-user/nvpd-0-r2::gentoo 0 KiB
[ebuild N ] dev-util/pahole-1.27-r1::gentoo USE="-debug -verify-sig" PYTHON_SINGLE_TARGET="python3_12 -python3_10 -python3_11 -python3_13" 0 KiB
[ebuild N ] virtual/linux-sources-3-r8::gentoo USE="-firmware" 0 KiB
[ebuild N ] x11-drivers/nvidia-drivers-565.77:0/565::gentoo USE="modules persistenced strip -X -dist-kernel -kernel-open -modules-compress -modules-sign -powerd -static-libs -tools -wayland" ABI_X86="(64) -32" 347766 KiB
Total: 4 packages (4 new), Size of downloads: 347766 KiB
Looks like that set is allowing it to build. Running it and will see if pytorch will see the card.
[Administrator edit: unchecked Disable BBCode in this post so that OP's quote tags work. -Hu] |
|
Back to top |
|
|
|