View previous topic :: View next topic |
Author |
Message |
Vrenn Guru
Joined: 15 Dec 2004 Posts: 327
|
Posted: Sun Apr 26, 2020 12:33 pm Post subject: [solved by wonder] foldingathome nvidia gpu trouble |
|
|
Der gentoo folders.
Times are special so I got my eyes on foldingathome.
My system is an old ASUS ROG laptop with a Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz and a dedicated GeForce GTX 980M (as only gpu, 16GB RAM, 4GB VRAM).
Installation was smooth. First I started with the standard config
Code: | <config>
<!-- Folding Slot Configuration -->
<cause v='COVID_19'/>
<!-- Slot Control -->
<power v='FULL'/>
<!-- User Information -->
<passkey v='private'/>
<team v='private'/>
<user v='Vrenn'/>
<!-- Folding Slots -->
<slot id='0' type='CPU'/>
<slot id='1' type='GPU'/>
</config> |
The CPU was working fine, looking at the http://client.foldingathome.org/ client. Green icon, working one to three hours each job.
Just the Geforce, titled as "GM204 [GeForce GTX 980M]..." stopped as yellow. It got a job once a while, never getting over 0,00%, always pointed out it would need 24h a day and dismisses it after a while.
The folding log:
Code: | 23:18:55: GPUs: 1
23:18:55: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 980M] 3189
23:18:55: CUDA: Not detected: cuInit() returned 100
23:18:55: OpenCL: Not detected: clGetDeviceIDs() returned -1
...
23:18:55:Enabled folding slot 00: READY cpu:7
23:18:55:Enabled folding slot 01: READY gpu:0:GM204 [GeForce GTX 980M] 3189
23:18:55:ERROR:No compute devices matched GPU #0 {
23:18:55:ERROR: "vendor": 4318,
23:18:55:ERROR: "device": 5079,
23:18:55:ERROR: "type": 2,
23:18:55:ERROR: "species": 5,
23:18:55:ERROR: "description": "GM204 [GeForce GTX 980M] 3189"
23:18:55:ERROR:}. You may need to update your graphics drivers.
23:18:55:WU00:FS01:Starting
23:18:55:ERROR:WU00:FS01:Failed to start core: OpenCL device matching slot 1 not found, make sure the OpenCL driver is installed or try setting 'opencl-index' manually |
It is hard to find any online examples for the config.xml, but as the FAHControl GUI is missing for gentoo I tried a config from https://www.reddit.com/r/Folding/comments/fp1pjh/need_help_triple_gpus_none_being_used/
The merged config somehow works is following
Code: | <config>
<!-- Folding Slot Configuration -->
<cause v='COVID_19'/>
<!-- Slot Control -->
<power v='FULL'/>
<!-- User Information -->
<passkey v='private'/>
<team v='private'/>
<user v='Vrenn'/>
<!-- Folding Slots -->
<slot id='0' type='CPU'/>
<slot id='1' type='GPU'>
<cuda-index v='0'/>
<gpu-index v='1'/>
<opencl-index v='0'/>
<paused v='true'/>
</slot>
</config> |
On the upside the GPU icon is now green (once it got an job), currently at 8,82% and needs about 16h. It seems to be working.
On the downside my GPU is now listend as "GPU:1:{ "VENDOR":0, "DEVICE": 0, "TYPE": 0, "SPECIE..."
The log is still bad!
Code: | 09:55:46: GPUs: 1
09:55:46: GPU 0: Bus:1 Slot:0 Func:0 NVIDIA:5 GM204 [GeForce GTX 980M] 3189
09:55:46: CUDA: Not detected: cuInit() returned 100
09:55:46: OpenCL: Not detected: clGetPlatformIDs() returned -1001q
...
09:55:46:Enabled folding slot 00: READY cpu:7
09:55:46:ERROR:Exception: GPU 1 not found
09:55:46:ERROR:No compute devices matched GPU #1 {
09:55:46:ERROR: "vendor": 0,
09:55:46:ERROR: "device": 0,
09:55:46:ERROR: "type": 0,
09:55:46:ERROR: "species": 0,
09:55:46:ERROR: "description": ""
09:55:46:ERROR:}. You may need to update your graphics drivers.
09:55:46:WARNING:WU00:Slot ID 18446744073709551615 no longer exists, migrating to FS00
09:55:46:ERROR:Exception: Unit not found
09:55:46:WU01:FS00:Starting
09:55:46:WU01:FS00:Running FahCore: /opt/foldingathome/FAHCoreWrapper /opt/foldingathome/cores/cores.foldingathome.org/v7/lin/64bit/avx/Core_a7.fah/FahCore_a7 -dir 01 -suffix 01 -version 706 -lifeline 37298 -checkpoint 15 -np 7
09:55:46:WU01:FS00:Started FahCore on PID 37307
09:55:46:WU01:FS00:Core PID:37311
09:55:46:WU01:FS00:FahCore 0xa7 started |
but also tells me later
Code: | 09:55:58:WU02:FS01:Connecting to 18.218.241.186:80
09:55:59:WU02:FS01:Assigned to work server 155.247.164.213
09:55:59:WU02:FS01:Requesting new work unit for slot 01: READY gpu:1:{
09:55:59:WU02:FS01: "vendor": 0,
09:55:59:WU02:FS01: "device": 0,
09:55:59:WU02:FS01: "type": 0,
09:55:59:WU02:FS01: "species": 0,
09:55:59:WU02:FS01: "description": ""
09:55:59:WU02:FS01:} from 155.247.164.213
09:55:59:WU02:FS01:Connecting to 155.247.164.213:8080
09:55:59:WU02:FS01:Downloading 2.82MiB
09:56:00:WU00:FS01:Upload complete
09:56:00:WU00:FS01:Server responded WORK_QUIT (404)
09:56:00:WARNING:WU00:FS01:Server did not like results, dumping |
change gpu-index to 0 or the cuda/opencl-indexes to -1 makes the gpu stick to yellow again.
nvidia-drivers at 440.82-r1
Any hint what I am doing wrong or is there any full manual for the config.xml?
Is the gpu info of webcontroll even real? It seems to slow.
current foldingathome 7.6.9
Can I do the right thing with a wrong config?
cuda device query (/opt/cuda/extras/demo_suite/deviceQuery)
Code: | CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 980M"
CUDA Driver Version / Runtime Version 10.2 / 10.2
CUDA Capability Major/Minor version number: 5.2
Total amount of global memory: 4035 MBytes (4231331840 bytes)
(12) Multiprocessors, (128) CUDA Cores/MP: 1536 CUDA Cores
GPU Max Clock rate: 1126 MHz (1.13 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 10.2, NumDevs = 1, Device0 = GeForce GTX 980M
Result = PASS |
_________________ With nice greetings
Vrenn
Last edited by Vrenn on Wed May 06, 2020 5:39 pm; edited 1 time in total |
|
Back to top |
|
|
Vrenn Guru
Joined: 15 Dec 2004 Posts: 327
|
Posted: Fri May 01, 2020 7:32 pm Post subject: |
|
|
I believe I learned something.
Current config.
Code: | <!-- Folding Slots -->
<slot id='0' type='CPU'/>
<slot id='1' type='GPU'>
<cuda-index v='0'/>
<gpu-index v='0'/>
<opencl-index v='0'/>
</slot> | Now my GPU is named correctly as gpu:0:GM204 [GeForce GTX 980M] 3189.
Its still slow, but now got a job.
I made two conclusions:
First, providing a wrong config (<gpu-index v='1'/> ) hides the gpu-name, giving my 5 years old hardware more jobs. (slower ones?)
Second, <opencl-index v='0'/> solves the opencl-index error.
I also added foldingathome-user to video group suggested at https://bugs.gentoo.org/715646
Anyway following logs still stay: Code: | 14:06:30: CUDA: Not detected: cuInit() returned 999
14:06:30: OpenCL: Not detected: clGetDeviceIDs() returned -1 | Still a lot to learn. _________________ With nice greetings
Vrenn |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri May 01, 2020 7:58 pm Post subject: |
|
|
long story short, just remove opencl from mesa. /etc/portage/package.use/mesa. in that file put: media-libs/mesa -opencl.
you should be fine after that. |
|
Back to top |
|
|
Vrenn Guru
Joined: 15 Dec 2004 Posts: 327
|
Posted: Sat May 02, 2020 2:17 pm Post subject: |
|
|
I believe I tested that first, anyway I gave it today a try.
Same log-errors again:
14:04:46: CUDA: Not detected: cuInit() returned 999
14:04:46: OpenCL: Not detected: clGetPlatformIDs() returned -1001
As mesa is not used by nvidia-drivers I thought perhaps an discovery-function of foldingathome might cause the errors. eselect opencl nvidia or ocl-icd shows no direct effect.
I'm grateful for any hint.
Otherwise my geforce got a 50 000 job yesterday, with the no cuda/opencl error in Log.txt...
How is this to be rated? _________________ With nice greetings
Vrenn |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Sat May 02, 2020 8:30 pm Post subject: |
|
|
A simple hint is to install clinfo and make sure you get only one opencl implementation. Not sure eselect is still doing anything. I think I've read some news recently about it becoming obsolete by a single icd loader. Not sure if that only applies to my ~unstable, or it got into stable as well. But clinfo is the way to find out. |
|
Back to top |
|
|
Vrenn Guru
Joined: 15 Dec 2004 Posts: 327
|
Posted: Sat May 02, 2020 11:31 pm Post subject: |
|
|
Both: Code: | Number of platforms 2
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 10.2.159
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
Platform Extensions function suffix NV
Platform Name Clover
Platform Vendor Mesa
Platform Version OpenCL 1.1 Mesa 19.3.5
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd
Platform Extensions function suffix MESA
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 980M
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 440.82
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Topology (NV) PCI-E, 01:00.0
Device Profile FULL_PROFILE
Device Available Yes
Compiler Available Yes
Linker Available Yes
Max compute units 12
Max clock frequency 1126MHz
Compute Capability (NV) 5.2
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Supported affinity domains (n/a)
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Address bits 64, Little-Endian
Global memory size 4231331840 (3.941GiB)
Error Correction support No
Max memory allocation 1057832960 (1009MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 589824 (576KiB)
Global Memory cache line size 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 268435456 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max number of constant args 9
Max constant buffer size 65536 (64KiB)
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) Yes
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 2
printf() buffer size 1048576 (1024KiB)
Built-in kernels (n/a)
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics
Platform Name Clover
Number of devices 0
NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) NVIDIA CUDA
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [NV]
clCreateContext(NULL, ...) [default] Success [NV]
clCreateContext(NULL, ...) [other] P [U
clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) Invalid device type for platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No platform | the clinfo output just differs in 5 lines.
the icd loader has additional Code: | ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.12
ICD loader Profile OpenCL 2.2 | Spooky, following works:
First: install clinfo
Second: execute clinfo
Third: systemctl start foldingathome
Now... Code: | cat /opt/foldingathome/log.txt | grep CUDA
23:13:23:CUDA Device 0: Platform:0 Device:0 Bus:1 Slot:0 Compute:5.2 Driver:10.2 | It doesn't matter what eselect opencl is set, but you must execute clinfo first before foldingathome...
Tested more than 3 times with restarts of the whole laptop, with icd (now running) and nvidia pure. This case is reproduceable on my system.
Getting a GPU workunit somehow fast. Might be a race condition? _________________ With nice greetings
Vrenn |
|
Back to top |
|
|
Vrenn Guru
Joined: 15 Dec 2004 Posts: 327
|
Posted: Sun May 03, 2020 12:12 am Post subject: |
|
|
(Answering to an deleted post?)
As I wrote before I tried your way: emerge mesa with -opencl and emerge --depclean unmerged libclc.
I even tested it now again.
Still: log.txt only gets error-free when executing clinfo first.
mesa +-opencl +- liblc, eselect ocl-icd/nvidia seems not to matter on my system.
Ugly workaround is now a script in /usr/local/sbin... but it works, thanks to your tip.
Now I have to go to sleep... _________________ With nice greetings
Vrenn |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Sun May 03, 2020 9:08 pm Post subject: |
|
|
I'm sorry. I deleted my post because I realized we're talking about a laptop, which it's probable that it has a hybrid videocard. Those nvidia optimus things... and I thought I was wasting your time. I really don't know my way around those things. When it comes to pure nvidia drivers, it's usually mesa with opencl that is creating issues, but on optimus... I just don't know. I don't have one of those things and I never had experience with those things.
On the other hand I have a lot of experience with putting my foot in my mouth, just because I missed that one thing: optimus. So again, sorry to have wasted your time, and if you managed to make it work in any way... just don't fix it anymore. |
|
Back to top |
|
|
Vrenn Guru
Joined: 15 Dec 2004 Posts: 327
|
Posted: Mon May 04, 2020 5:48 pm Post subject: |
|
|
You are not that wrong...
It is a ROG with Nvidia from 2015. That time, the powerful gaming-laptops have used nvidia-chips as single gpus. That's my case. There is no optimus. But I can't say if there might be optimus-leftovers in the firmware.
The opencl-index has to be set, no question, no big deal.
But running the clinfo first makes me wonder.
The "hack" works for me, but I find it ugly.
It seems foldingathome fails to detect the opencl/cuda capabilities at the start. It later on might get a job, but really late. The log.txt is written just at the start, so the Errors remain. Does it realize the opencl/cuda capabilities later in the runtime?
It would fit into the second specialness: running clinfo first, whitch has also the job to detect opencl/cuda makes foldingathome detect them right at the startup and writing an error-free log.txt.
It isn't perfect, but it is "working for me".
Time comes I'll solve this puzzle too, or get an all AMD top gaming PC
Anyway, you got me to clinfo, speeding up gpu jobs really fast.
ps: I don't like optimus&co, might there be Allamd' with dedicated gpu's? _________________ With nice greetings
Vrenn |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Tue May 05, 2020 9:37 pm Post subject: |
|
|
Vrenn wrote: | You are not that wrong...
It is a ROG with Nvidia from 2015. That time, the powerful gaming-laptops have used nvidia-chips as single gpus. That's my case. There is no optimus. But I can't say if there might be optimus-leftovers in the firmware.
The opencl-index has to be set, no question, no big deal.
But running the clinfo first makes me wonder.
The "hack" works for me, but I find it ugly.
It seems foldingathome fails to detect the opencl/cuda capabilities at the start. It later on might get a job, but really late. The log.txt is written just at the start, so the Errors remain. Does it realize the opencl/cuda capabilities later in the runtime?
It would fit into the second specialness: running clinfo first, whitch has also the job to detect opencl/cuda makes foldingathome detect them right at the startup and writing an error-free log.txt.
It isn't perfect, but it is "working for me".
Time comes I'll solve this puzzle too, or get an all AMD top gaming PC
Anyway, you got me to clinfo, speeding up gpu jobs really fast.
ps: I don't like optimus&co, might there be Allamd' with dedicated gpu's? |
my dad use to say that the worst enemy of "good", is "better".
if it's working... don't fix it. I mentioned that.
i don't know what's going on there, but at one time it worked. so stop fixing it.
the world such as it is, is held on by spit, hope and duct tape that holds it all together.
I've been waiting for a 3d print for 8 hours. terribly complicated model. not gonna fight against the current.
whatever works man. whatever works. |
|
Back to top |
|
|
Vrenn Guru
Joined: 15 Dec 2004 Posts: 327
|
Posted: Wed May 06, 2020 5:37 pm Post subject: |
|
|
You are right. I'll take it as a working miracle.
I did choose gentoo for different reasons, one was to learn.
But at now, at this, I reached my destination. _________________ With nice greetings
Vrenn |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|