Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Blacklisting Bad VRAM Addresses
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
supremmotu
n00b
n00b


Joined: 19 Feb 2021
Posts: 9

PostPosted: Mon Aug 21, 2023 5:47 am    Post subject: Blacklisting Bad VRAM Addresses Reply with quote

I know that I can blacklist bad RAM addresses with the memmap kernel parameter. How would I go about blocking VRAM addresses? I would buy another GPU, but they aren't exactly cheap. I have an AMD gpu (5700xt) if that changes anything.

Edit: I also found the amdgpu_vram_mgr_reserve_range function in the kernel inside drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c which "Reserves memory from start address with the specified size in VRAM", but I'm not exactly sure how I would use it.
Back to top
View user's profile Send private message
gorg86
Guru
Guru


Joined: 20 May 2011
Posts: 308

PostPosted: Mon Aug 21, 2023 4:09 pm    Post subject: Reply with quote

You could either use the amdgpu.vramlimit kernel parameter but depending on where the bad memory blocks are that might be useless. Or you could maybe use AMDGPU RAS if your card supports it.
I could not find a list of supported GPUs online and I do not know if you have to compile a debug kernel for this...
I got an old Polaris card it shows /sys/module/amdgpu/parameters/ras_mask but there is no /sys/kernel/debug/dri/0/ras folder.
The info about this feature is almost nil online.

If I understand this correctly then it should be possible to disable memory blocks with:
Code:
echo "disable <block>" > /sys/kernel/debug/dri/<N>/ras/ras_ctrl


If that works how I think it does then /sys/class/drm/card0/device/mem_info_vram_total should decrease.

There is a ras_tests.c which supposedly tests what's available on your system, but I don't know how to execute that either :roll:

Reference links:
https://dri.freedesktop.org/docs/drm/gpu/amdgpu/module-parameters.html
https://docs.kernel.org/gpu/amdgpu/ras.html
https://cgit.freedesktop.org/mesa/drm/tree/tests/amdgpu/ras_tests.c
Back to top
View user's profile Send private message
supremmotu
n00b
n00b


Joined: 19 Feb 2021
Posts: 9

PostPosted: Mon Aug 21, 2023 11:28 pm    Post subject: Reply with quote

Do you know how I would specify the <block>? For instance, I am trying to blacklist addresses from 0x23CE73E40 to 0x23CE73E4F.
Back to top
View user's profile Send private message
gorg86
Guru
Guru


Joined: 20 May 2011
Posts: 308

PostPosted: Mon Aug 21, 2023 11:51 pm    Post subject: Reply with quote

No, you have to find out how the card manages the memory.
Code:
cat /sys/kernel/debug/dri/0/amdgpu_vram_mm

might help.
I can't test this on my PC
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum