View previous topic :: View next topic |
Author |
Message |
supremmotu n00b
Joined: 19 Feb 2021 Posts: 9
|
Posted: Mon Aug 21, 2023 5:47 am Post subject: Blacklisting Bad VRAM Addresses |
|
|
I know that I can blacklist bad RAM addresses with the memmap kernel parameter. How would I go about blocking VRAM addresses? I would buy another GPU, but they aren't exactly cheap. I have an AMD gpu (5700xt) if that changes anything.
Edit: I also found the amdgpu_vram_mgr_reserve_range function in the kernel inside drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c which "Reserves memory from start address with the specified size in VRAM", but I'm not exactly sure how I would use it. |
|
Back to top |
|
|
gorg86 Guru
Joined: 20 May 2011 Posts: 308
|
Posted: Mon Aug 21, 2023 4:09 pm Post subject: |
|
|
You could either use the amdgpu.vramlimit kernel parameter but depending on where the bad memory blocks are that might be useless. Or you could maybe use AMDGPU RAS if your card supports it.
I could not find a list of supported GPUs online and I do not know if you have to compile a debug kernel for this...
I got an old Polaris card it shows /sys/module/amdgpu/parameters/ras_mask but there is no /sys/kernel/debug/dri/0/ras folder.
The info about this feature is almost nil online.
If I understand this correctly then it should be possible to disable memory blocks with:
Code: | echo "disable <block>" > /sys/kernel/debug/dri/<N>/ras/ras_ctrl |
If that works how I think it does then /sys/class/drm/card0/device/mem_info_vram_total should decrease.
There is a ras_tests.c which supposedly tests what's available on your system, but I don't know how to execute that either
Reference links:
https://dri.freedesktop.org/docs/drm/gpu/amdgpu/module-parameters.html
https://docs.kernel.org/gpu/amdgpu/ras.html
https://cgit.freedesktop.org/mesa/drm/tree/tests/amdgpu/ras_tests.c |
|
Back to top |
|
|
supremmotu n00b
Joined: 19 Feb 2021 Posts: 9
|
Posted: Mon Aug 21, 2023 11:28 pm Post subject: |
|
|
Do you know how I would specify the <block>? For instance, I am trying to blacklist addresses from 0x23CE73E40 to 0x23CE73E4F. |
|
Back to top |
|
|
gorg86 Guru
Joined: 20 May 2011 Posts: 308
|
Posted: Mon Aug 21, 2023 11:51 pm Post subject: |
|
|
No, you have to find out how the card manages the memory.
Code: | cat /sys/kernel/debug/dri/0/amdgpu_vram_mm |
might help.
I can't test this on my PC |
|
Back to top |
|
|
|