View previous topic :: View next topic |
Author |
Message |
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Fri Sep 13, 2024 7:57 am Post subject: Plasma 6.1 wayland nVidia, let's code something |
|
|
Hello,
I migrated to Plasma 6.1 yesterday, gave a try to Wayland but had to switch back to X11 due to nVidia settings not able to control fans on Wayland.
On X11, each desktop effect freeze de system 1 second, calling spectacle requires 2 seconds, pressing on square region also freeze the system for 2 seconds. Putting a window full screen is lagy.
My system is quite high end:
- Ryzen 9 5900X
- 64 GB of DDR4 memory
- nVidia RTX 3090-TI OC
- NVME SDD
I am using the unstable driver 560.
After digging, I did reduce the vm cache and this improved marginally.
/etc/sysctl.conf
Code: | # less cache memory from 6.4GB to 3.2GB
vm.dirty_ratio = 5
# write to disk when 1% so 640Mo
vm.dirty_background_ratio = 1 |
So I think I have to switch to wayland, X11 is no more a choice with nVidia proprietary, Plasma 5 was fine on X11, 6.1 is not.
nVidia did implement more and more NVLM API on the driver, so there is a way to control the Graphic card from wayland because NVLM is not relying on X11 libs.
https://docs.nvidia.com/deploy/nvml-api/change-log.html#change-log
If someone can help me start on coding with QT6 I would appreciate. There is already a python interface, but I would prefer to make this in C++. I already took a look a nVidia setting, but the project is large and I struggle to collect all the necessary .h and .c content to have the set of NVLM functions to start with.
https://github.com/NVIDIA/nvidia-settings/blob/main/src/nvml.h
I already wrote an application to control fans with an hard-coded curve few years back for QT5, now migrated to QT6, but it does use the X11 nVidia lib, so no Wayland.
https://pixel.rodrik.ch/NvControl.png
Ridrok
Last edited by Ridrok on Fri Sep 13, 2024 9:34 am; edited 1 time in total |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2400
|
Posted: Fri Sep 13, 2024 9:22 am Post subject: Re: Plasma 6.1 wayland nVidia, let's code something |
|
|
Ridrok wrote: |
On X11, each desktop effect freeze de system 1 second, calling spectacle requires 2 seconds, pressing on square region also freeze the system for 2 seconds. Putting a window full screen is lagy. |
Did you check your Xorg logs if it in fact used nVidia and not something like VESA?
Ridrok wrote: | So I think I have to switch to wayland, X11 is no more a choice with nVidia proprietary. |
That's hard to believe. If that was the case, the forums would be flooded already.
Ridrok wrote: | If someone can help me start on coding with QT6 I would appreciate. |
I don't want to discourage you, but this is a hard route and I don't think coding with QT6 will solve your fans problem. I think you didn't in fact use the nVidia driver. I have a laptop and can't verify if nvidia-settings can control fans but I'm pretty sure it's functional under Wayland and the fact its interface provides fewer configuration options is due to the fact many of them were Xorg specific under Xorg.
Please verify you were indeed using the nVidia driver.
Best Regards,
Georgi |
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Fri Sep 13, 2024 9:30 am Post subject: Re: Plasma 6.1 wayland nVidia, let's code something |
|
|
logrusx wrote: | Please verify you were indeed using the nVidia driver. |
It does
Code: | 28.442] (II) Loading /usr/lib64/xorg/modules/extensions/libglxserver_nvidia.so
[ 28.596] (II) Module glxserver_nvidia: vendor="NVIDIA Corporation"
[ 28.596] compiled for 1.6.99.901, module version = 1.0.0
[ 28.596] Module class: X.Org Server Extension
[ 28.596] (II) NVIDIA GLX Module 560.35.03 Fri Aug 16 21:27:48 UTC 2024
[ 28.597] (II) NVIDIA: The X server supports PRIME Render Offload.
[ 28.603] (--) NVIDIA(0): Valid display device(s) on GPU-0 at PCI:10:0:0
[ 28.603] (--) NVIDIA(0): DFP-0
[ 28.603] (--) NVIDIA(0): DFP-1
[ 28.603] (--) NVIDIA(0): DFP-2
[ 28.603] (--) NVIDIA(0): DFP-3
[ 28.603] (--) NVIDIA(0): DFP-4
[ 28.603] (--) NVIDIA(0): DFP-5 (boot)
[ 28.603] (--) NVIDIA(0): DFP-6
[ 28.624] (II) NVIDIA(0): NVIDIA GPU NVIDIA GeForce RTX 3090 (GA102-A) at PCI:10:0:0
[ 28.624] (II) NVIDIA(0): (GPU-0)
[ 28.624] (--) NVIDIA(0): Memory: 25165824 kBytes
[ 28.624] (--) NVIDIA(0): VideoBIOS: 94.02.42.80.9f
|
And I have to add, my GPU is a poor one from Zotac, if I don't control the fan better than the card does, when I run AI the GPU reach 90°C and I loose the display. Have to stop the PC and let the card cool.
With custom curve, I do not go beyond 67/68°C. |
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Fri Sep 13, 2024 12:50 pm Post subject: |
|
|
Let's Begin, using nVidia settings app from github.
From NvCtrlAttributesPrivate.h I found this
Code: | typedef struct __NvCtrlNvmlAttributes NvCtrlNvmlAttributes;
struct __NvCtrlNvmlAttributes {
struct {
void *handle;
typeof(nvmlInit) (*Init);
typeof(nvmlShutdown) (*Shutdown);
typeof(nvmlDeviceGetHandleByIndex) (*DeviceGetHandleByIndex);
typeof(nvmlDeviceGetUUID) (*DeviceGetUUID);
typeof(nvmlDeviceGetCount) (*DeviceGetCount);
typeof(nvmlDeviceGetTemperature) (*DeviceGetTemperature);
typeof(nvmlDeviceGetName) (*DeviceGetName);
typeof(nvmlDeviceGetVbiosVersion) (*DeviceGetVbiosVersion);
typeof(nvmlDeviceGetMemoryInfo) (*DeviceGetMemoryInfo);
typeof(nvmlDeviceGetMemoryInfo_v2) (*DeviceGetMemoryInfo_v2);
typeof(nvmlDeviceGetPciInfo) (*DeviceGetPciInfo);
typeof(nvmlDeviceGetCurrPcieLinkWidth) (*DeviceGetCurrPcieLinkWidth);
typeof(nvmlDeviceGetMaxPcieLinkGeneration) (*DeviceGetMaxPcieLinkGeneration);
typeof(nvmlDeviceGetMaxPcieLinkWidth) (*DeviceGetMaxPcieLinkWidth);
typeof(nvmlDeviceGetVirtualizationMode) (*DeviceGetVirtualizationMode);
typeof(nvmlDeviceGetGridLicensableFeatures) (*DeviceGetGridLicensableFeatures);
typeof(nvmlDeviceGetGspFirmwareMode) (*DeviceGetGspFirmwareMode);
typeof(nvmlDeviceGetUtilizationRates) (*DeviceGetUtilizationRates);
typeof(nvmlDeviceGetTemperatureThreshold) (*DeviceGetTemperatureThreshold);
typeof(nvmlDeviceGetFanSpeed_v2) (*DeviceGetFanSpeed_v2);
typeof(nvmlSystemGetDriverVersion) (*SystemGetDriverVersion);
typeof(nvmlDeviceGetEccMode) (*DeviceGetEccMode);
typeof(nvmlDeviceGetDefaultEccMode) (*DeviceGetDefaultEccMode);
typeof(nvmlDeviceSetEccMode) (*DeviceSetEccMode);
typeof(nvmlDeviceGetTotalEccErrors) (*DeviceGetTotalEccErrors);
typeof(nvmlDeviceClearEccErrorCounts) (*DeviceClearEccErrorCounts);
typeof(nvmlDeviceGetMemoryErrorCounter) (*DeviceGetMemoryErrorCounter);
typeof(nvmlSystemGetNVMLVersion) (*SystemGetNVMLVersion);
typeof(nvmlDeviceGetNumGpuCores) (*DeviceGetNumGpuCores);
typeof(nvmlDeviceGetMemoryBusWidth) (*DeviceGetMemoryBusWidth);
typeof(nvmlDeviceGetIrqNum) (*DeviceGetIrqNum);
typeof(nvmlDeviceGetPowerSource) (*DeviceGetPowerSource);
typeof(nvmlDeviceGetNumFans) (*DeviceGetNumFans);
typeof(nvmlDeviceSetFanSpeed_v2) (*DeviceSetFanSpeed_v2);
typeof(nvmlDeviceGetTargetFanSpeed) (*DeviceGetTargetFanSpeed);
typeof(nvmlDeviceGetMinMaxFanSpeed) (*DeviceGetMinMaxFanSpeed);
typeof(nvmlDeviceSetFanControlPolicy) (*DeviceSetFanControlPolicy);
typeof(nvmlDeviceGetFanControlPolicy_v2) (*DeviceGetFanControlPolicy_v2);
typeof(nvmlDeviceSetDefaultFanSpeed_v2) (*DeviceSetDefaultFanSpeed_v2);
typeof(nvmlDeviceGetPowerUsage) (*DeviceGetPowerUsage);
typeof(nvmlDeviceGetPowerManagementDefaultLimit) (*DeviceGetPowerManagementDefaultLimit);
typeof(nvmlDeviceGetPowerManagementLimitConstraints) (*DeviceGetPowerManagementLimitConstraints);
} lib;
unsigned int deviceIdx; /* XXX Needed while using NV-CONTROL as fallback */
unsigned int deviceCount;
unsigned int sensorCount;
unsigned int *sensorCountPerGPU;
unsigned int coolerCount;
unsigned int *coolerCountPerGPU;
}; |
from NvCtrlAttributesNvml.c I found this
Code: | static Bool LoadNvml(NvCtrlNvmlAttributes *nvml)
{
enum {
_OPTIONAL,
_REQUIRED
};
nvmlReturn_t ret;
nvml->lib.handle = dlopen("libnvidia-ml.so.1", RTLD_LAZY);
if (nvml->lib.handle == NULL) {
goto fail;
}
#define STRINGIFY_SYMBOL(_symbol) #_symbol
#define EXPAND_STRING(_symbol) STRINGIFY_SYMBOL(_symbol)
#define GET_SYMBOL(_required, _proc) \
nvml->lib._proc = dlsym(nvml->lib.handle, "nvml" STRINGIFY_SYMBOL(_proc)); \
nvml->lib._proc = dlsym(nvml->lib.handle, EXPAND_STRING(nvml ## _proc)); \
if (nvml->lib._proc == NULL) { \
if (_required) { \
goto fail; \
} else { \
nvml->lib._proc = (void*) NvmlStubFunction; \
} \
}
GET_SYMBOL(_REQUIRED, Init);
GET_SYMBOL(_REQUIRED, Shutdown);
GET_SYMBOL(_REQUIRED, DeviceGetHandleByIndex);
GET_SYMBOL(_REQUIRED, DeviceGetUUID);
GET_SYMBOL(_REQUIRED, DeviceGetCount);
GET_SYMBOL(_REQUIRED, DeviceGetTemperature);
GET_SYMBOL(_REQUIRED, DeviceGetName);
GET_SYMBOL(_REQUIRED, DeviceGetVbiosVersion);
GET_SYMBOL(_REQUIRED, DeviceGetMemoryInfo);
etc.... |
I think it's a good start to do a bit of NVLM since it will map the functions to the library api. |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2851
|
Posted: Fri Sep 13, 2024 1:53 pm Post subject: |
|
|
Honestly surprised that there's nothing for this yet afaik. I did write a tiny fan control daemon myself given I didn't like the default curve but it (also) uses libXNVCtrl
fwiw closed-source nvidia-smi (which heavily relies on nvml and dlopens libnvidia-ml) can be used to set a few things and see fan speed without X/wayland but I don't think it let you control the fan itself (it'd be horrible to fork nvidia-smi every time want to probe/set the fan speed to have a proper curve anyway).
I do hope nvidia will eventually improve nvidia-settings and the supporting libraries anyhow (and documentation to use them).
Last edited by Ionen on Fri Sep 13, 2024 2:05 pm; edited 1 time in total |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2400
|
Posted: Fri Sep 13, 2024 2:04 pm Post subject: |
|
|
I'm still skeptical about poor performance on Xorg/NVIDIA.
What do tools like glxinfo say when run from a terminal under KDE?
I would go through the relevant wiki pages and also through arch wiki to see if I can fix it. Obviously you need fan control asap.
Another idea is to run a separate Xorg session from where you can set the custom curve with nvidia-settings for the time being.
Best Regards,
Georgi |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2851
|
Posted: Fri Sep 13, 2024 2:08 pm Post subject: |
|
|
logrusx wrote: | I'm still skeptical about poor performance on Xorg/NVIDIA. | But yeah there's that, it works just fine for about everyone including myself. I feel like there may be something else going on if things are noticeably slow (like using llvmpipe rather than nvidia).
glxinfo would be interesting indeed. |
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Fri Sep 13, 2024 2:17 pm Post subject: |
|
|
Ionen wrote: | glxinfo would be interesting indeed. |
Here you go
https://pastebin.com/6Q5BeG9a
To tell a bit more about how it behaves, for example if I watch a YT video and discord makes a popup, I get a freeze in the video for less than a second, but it's noticeable. Never had this under X11 with KDE 4 then Plasma 5. (Yes I am on Gentoo for so many years).
I have red tons of pages already, most are talking about QML cache, I did not see a way to fix this, even checked the source of kwin main.cpp if the patch to not cache was in the source. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2400
|
Posted: Fri Sep 13, 2024 2:39 pm Post subject: |
|
|
Code: | direct rendering: Yes |
That much I can tell. Unfortunately is there's something else I'm not the guy to spot it.
Ridrok wrote: | To tell a bit more about how it behaves, for example if I watch a YT video and discord makes a popup, I get a freeze in the video for less than a second, but it's noticeable. Never had this under X11 with KDE 4 then Plasma 5. (Yes I am on Gentoo for so many years).
I have red tons of pages already, most are talking about QML cache, I did not see a way to fix this, even checked the source of kwin main.cpp if the patch to not cache was in the source. |
Maybe upload Xorg log for those who can read it. Again I'm not the guy. I just hope someone will open it even out of curiosity and spot something.
Best Regards,
Georgi
p.s. try the idea to set the curve from an Xorg session. It would be nice if you can set it once and log off, if not, you can just abandon the session and start a Wayland one.
p.s.2 I assume you don't have multiple monitors?
p.s.3 this is a long shot but: https://forum.endeavouros.com/t/kde-stutters-whenever-i-open-close-move-resize-a-window/56237
Quote: | found out the problem’s cause. the window decorations i was using (psion). who would’ve guessed. |
|
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Fri Sep 13, 2024 3:22 pm Post subject: |
|
|
logrusx wrote: |
p.s. try the idea to set the curve from an Xorg session. It would be nice if you can set it once and log off, if not, you can just abandon the session and start a Wayland one.
p.s.2 I assume you don't have multiple monitors?
p.s.3 this is a long shot but: https://forum.endeavouros.com/t/kde-stutters-whenever-i-open-close-move-resize-a-window/56237
Quote: | found out the problem’s cause. the window decorations i was using (psion). who would’ve guessed. |
|
for pt 1, will try this, but I do progress on the NVML part
for pt 2, I have only a 30" Dell UP3017 Screen
I have a decoration, have to check this. it's aero-cielo a Vista theme, that may be the cause. |
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Fri Sep 13, 2024 8:14 pm Post subject: |
|
|
Removing the previous decoration fixed the problem on X11.
Also progressed a bit on NVML, I have the NVML code not called but compiling with QT6.
I cannot progress much until next week, will keep you posted. |
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Sat Sep 14, 2024 9:04 am Post subject: |
|
|
I am proud I managed to do some pure nvml directly from a QT6 program.
This is what code outputs as debug information
Code: | Init ret: 0
Count function: 1816591216
Card Count: 1
Card 0 name: NVIDIA GeForce RTX 3090
|
|
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
Posted: Sat Sep 14, 2024 7:54 pm Post subject: |
|
|
I did it !
Fan control in QT without any dependency in X, direct NVML with 2 processes. Main application runs as user to do all the queries and display. A small process with chmod u+s is launched by the QT application. And then the QT main application sends command to the other process which in turn do the writes on the driver.
I will cleanup code a lot to put a template application on github.
there is nothing preventing from overclocking the card too. No coolbit needed.
Ridrok |
|
Back to top |
|
|
Ionen Developer
Joined: 06 Dec 2018 Posts: 2851
|
Posted: Sat Sep 14, 2024 8:11 pm Post subject: |
|
|
Nice, good to hear it's possible. |
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
|
Back to top |
|
|
Ridrok Tux's lil' helper
Joined: 26 Jan 2014 Posts: 108 Location: France
|
|
Back to top |
|
|
|