View previous topic :: View next topic |
Author |
Message |
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 11:29 am Post subject: |
|
|
Wow...not looking good here. There's still a recompile of one unrelated program finishing on that update, but I just tried my frontend after stopping it and reloading the nvidia kernel module. I get a desktop with this config in /etc/X11/xorg.conf.d/nvidia.conf:
Code: | Section "OutputClass"
Identifier "nvidia"
MatchDriver "nvidia-drm"
Driver "nvidia"
Option "AllowEmptyInitialConfiguration"
ModulePath "/usr/lib/extensions/nvidia"
EndSection |
However it uses almost the entire CPU doing anything at all...hopeless. Then I noticed that for some reason the nvidia-drm module wasn't loaded when it was before with the old driver and old setup. I see that compiling nvidia-drivers was warning me that the kernel didn't have CONFIG_DRM or CONFIG_DRM_KMS_HELPER. Apparently I never did and it didn't matter. I'm re-compiling the kernel with that now.
Unrelated stuff: I also ran into this one that you're familiar with: https://forums.gentoo.org/viewtopic-t-1110078-start-0.html
I just temporarily blocked that update to get around that. I also had the update to x11-libs/cairo-1.16.0-r4 fail and had to block that. The error was just this: Code: | collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:605: cairo-sphinx] Error 1
make[4]: Leaving directory '/var/tmp/portage/x11-libs/cairo-1.16.0-r4/work/cairo-1.16.0-abi_x86_32.x86/util/cairo-sphinx'
make[3]: *** [Makefile:1003: all-recursive] Error 1
make[3]: Leaving directory '/var/tmp/portage/x11-libs/cairo-1.16.0-r4/work/cairo-1.16.0-abi_x86_32.x86/util'
make[2]: *** [Makefile:780: all] Error 2
make[2]: Leaving directory '/var/tmp/portage/x11-libs/cairo-1.16.0-r4/work/cairo-1.16.0-abi_x86_32.x86/util'
make[1]: *** [Makefile:909: all-recursive] Error 1
make[1]: Leaving directory '/var/tmp/portage/x11-libs/cairo-1.16.0-r4/work/cairo-1.16.0-abi_x86_32.x86'
make: *** [Makefile:760: all] Error 2
* ERROR: x11-libs/cairo-1.16.0-r4::gentoo failed (compile phase) |
NOT happy with this so far. I'm hoping that DRM in the kernel will make a difference.
Tom |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2884
|
Posted: Wed Aug 26, 2020 11:47 am Post subject: |
|
|
Yeah the glapi issue only happen while switching as explained and is known to happen with cairo too, "properly" preventing it would be a large scale effort so I don't think that'll be fixed. But once switched it won't come back so it's not that bad, people doing smaller frequent world updates are less likely to hit it too.
I don't use DRM nor its config options, so I'm not sure if that can help (I don't load nvidia-drm either), and I believe DRM support was somewhat early stage in 390 too. I guess the OutputClass matching on nvidia-drm could be related though, not sure how these match exactly.
Is glx properly being used? Does `glxinfo | head` show NVIDIA and not something like llvmpipe? The latter would likely indicate the OutputClass wasn't used.
Last edited by Ionen on Wed Aug 26, 2020 12:15 pm; edited 1 time in total |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 12:15 pm Post subject: |
|
|
Ionen wrote: | Is glx properly being used? Does `glxinfo | head` show NVIDIA? | Apparently not: Code: | glxinfo | head
name of display: :0.0
display: :0 screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
server glx extensions:
GLX_ARB_context_flush_control, GLX_ARB_create_context,
GLX_ARB_create_context_no_error, GLX_ARB_create_context_profile,
GLX_ARB_fbconfig_float, GLX_ARB_framebuffer_sRGB, GLX_ARB_multisample,
GLX_EXT_create_context_es2_profile, GLX_EXT_create_context_es_profile, |
Should I post my entire X log? I do see one warning in there: Code: | [ 653.440] (WW) Warning, couldn't open module xtrap
[ 653.440] (EE) Failed to load module "xtrap" (module does not exist, 0) | Not sure if that's significant.
Tom |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2884
|
Posted: Wed Aug 26, 2020 12:20 pm Post subject: |
|
|
So no GLX initialization errors? (they would have (EE) in front)
If not I'm not quite what's happening here either
Edit:
Maybe driver wasn't loaded at all and could use a simple Device section like: Code: | Section "Device"
Identifier "nvidia"
Driver "nvidia"
EndSection | If already had one or didn't help, seeing the log anyway may give hints.
Edit2: Also, example of configuration for 390 that should be working here minus the lib64. I'm sure it's just something small and can get this working fine.
Last edited by Ionen on Wed Aug 26, 2020 12:39 pm; edited 2 times in total |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 12:31 pm Post subject: |
|
|
Ionen wrote: | So no GLX initialization errors? (they would have (EE) in front)
If not I'm not quite what's happening here either
Edit:
Maybe driver wasn't loaded at all and could use a simple Device section like: Code: | Section "Device"
Identifier "nvidia"
Driver "nvidia"
EndSection | If already had one or didn't help, seeing the log anyway may give hints. | Yea...I have these: Code: | [ 653.458] (**) NVIDIA(0): Enabling 2D acceleration
[ 653.458] (EE) NVIDIA(0): Failed to initialize the GLX module; please check in your X
[ 653.458] (EE) NVIDIA(0): log file that the GLX module has been loaded in your X
[ 653.458] (EE) NVIDIA(0): server, and that the module is the NVIDIA GLX module. If
[ 653.458] (EE) NVIDIA(0): you continue to encounter problems, Please try
[ 653.458] (EE) NVIDIA(0): reinstalling the NVIDIA driver. |
My xorg.conf is very old, though I do have the "nvidia" driver for sure. Here's the Device section: Code: | Section "Device"
Identifier "Card0"
Driver "nvidia"
VendorName "nVidia Corporation"
BoardName "GT 430"
VideoRam 131072
Option "TripleBuffer" "True"
Option "NoLogo" "true"
BusID "PCI:1:0:0"
Option "Coolbits" "1"
EndSection | Keep in mind that this was all working before.
Tom |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2884
|
Posted: Wed Aug 26, 2020 12:35 pm Post subject: |
|
|
Yeah okay, that's "probably" the bug we've been going on about already then. I guess it's not picking up the ModulePath.
I guess "nvidia-drm" could have something to do with it given the outputclass match, I'm honestly not familiar with this whole matching system.
What happen if you set the modulepath globally, like: Code: | Section "Files"
ModulePath "/usr/lib/extensions/nvidia"
EndSection | And the path does exist now right? Should have a libglx.so symlink + versioned library.
Edit: also did the nvidia-drivers install that bogus currently-broken nvidia-390.conf? I guess it could be conflicting too |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 12:47 pm Post subject: |
|
|
Ionen wrote: | What happen if you set the modulepath globally, like: Code: | Section "Files"
ModulePath "/usr/lib/extensions/nvidia"
EndSection | And the path does exist now right? Should have a libglx.so symlink + versioned library. | I'd actually just tried adding that to nvidia.conf file and it failed completely with "no screens found". THIS is ugly.
I'm actually recompiling the entire kernel now, only because I can't be sure that it wasn't built on gcc 9.2. I'm currently running gcc 9.3. Doubt that will help though.
EDIT: Yes by the way...that path is there now: Code: | find /usr/lib/extensions/nvidia
/usr/lib/extensions/nvidia
/usr/lib/extensions/nvidia/libglx.so.390.138
/usr/lib/extensions/nvidia/libglx.so | Tom |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 12:50 pm Post subject: |
|
|
Ionen wrote: | Edit: also did the nvidia-drivers install that bogus currently-broken nvidia-390.conf? I guess it could be conflicting too | No...all that's in that directory is my nvidia.conf.
Tom |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2884
|
Posted: Wed Aug 26, 2020 12:57 pm Post subject: |
|
|
Hmm, would need to see the full log but I get the feeling that failing further with global path would mean nvidia driver didn't load at all after all. Rebuilding the kernel+drivers may in fact help, if in doubt rebuild libglvnd and xorg-server too just to be sure everything's fine (mesa doesn't matter). |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 1:21 pm Post subject: |
|
|
OMG...looks like I got it! In my config, instead of this: Code: | Section "Files"
ModulePath "/usr/lib/extensions/nvidia"
EndSection | ...as per this:
https://713546.bugs.gentoo.org/attachment.cgi?id=624156
...I used this in my config: Code: | Section "Files"
ModulePath "/usr/lib/extensions/nvidia,/usr/lib/xorg/modules"
EndSection | ...and GLX is working using nvidia...just wow.
Tom |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2884
|
Posted: Wed Aug 26, 2020 1:26 pm Post subject: |
|
|
Ohh, I guess if using global section both paths are indeed needed unlike with OutputClass and that's why it was worse (my bad for not knowing). But I assume the OutputClass was indeed not matching then.
Well anyhow, glad to hear it works |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 1:31 pm Post subject: |
|
|
Ionen wrote: | Ohh, I guess if using global section both paths are indeed needed unlike with OutputClass and that's why it was worse (my bad for not knowing). But I assume the OutputClass was indeed not matching then.
Well anyhow, glad to hear it works | Thanks for the help! What's even more interesting is that, with that Files section above, apparently I don't need that OutputClass section at all! I just commented it out and still get nvidia GLX just fine. Confusing beyond words.
EDIT: Those failed upgrades all worked when run after the others...even that screwy ld error upgrading cairo. So that one was clearly some issue with the upgrade order.
Tom |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2884
|
Posted: Wed Aug 26, 2020 2:27 pm Post subject: |
|
|
tld wrote: | Those failed upgrades all worked when run after the others...even that screwy ld error upgrading cairo. So that one was clearly some issue with the upgrade order. | libglvnd+nvidia can be considered a mesa alternative (nvidia provides libraries, and libglvnd the headers), making mesa hardly relevant.
When ebuilds need EGL/GLES/etc.. they usually depend on mesa directly, and for GL either mesa or more usually virtual/opengl (which is mostly just mesa too but could be updated), and thus portage doesn't think nvidia-drivers matters and opt to rebuild it last.
But now we have >300 ebuilds that depend on mesa, >700 for virtual/opengl. For what I think would be a complete+proper fix all of those would need a virtual to alternatively depend on libglvnd+nvidia-drivers, and preferably be tested to see if they build with _just_ those and no mesa installed (then mesa could be depcleaned even, I mean I did build mesa-progs with egl/gles2/gl .. without mesa). I don't think that's gonna happen And now that eselect-opengl is going away, less people will be switching anyway so the issue will just be swept under the rug. Could always make more workarounds like xorg-server did but that'll probably just make things worse (that blocker caused a lot of nonsense), so unless go all out might as well leave it alone and help users that run into it directly.
That aside, xorg-server does need 1 single header file from mesa (that could be made optional with some work), so don't try removing mesa at home |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1848
|
Posted: Wed Aug 26, 2020 3:09 pm Post subject: |
|
|
Yea...I thought I saw mention that in theory mesa was irrelevant with this. Sounds like quite a complex mess around that.
Tom |
|
Back to top |
|
|
|