Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Fast Swap in unused VRAM
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
causality
Apprentice
Apprentice


Joined: 03 Jun 2006
Posts: 239

PostPosted: Tue Dec 10, 2024 11:54 pm    Post subject: Fast Swap in unused VRAM Reply with quote

So, here is the output of "swapon":

Code:
# swapon
NAME        TYPE      SIZE   USED PRIO
/dev/sdb2   partition 2.1G 564.6M   -2
/dev/loop16 partition 1.9G   1.2G   32


This was after several hours of setting vm.swappiness to a super aggressive value of 133 (from 20) just to see what would happen. Now it's sitting at 80.

The swap partition at /dev/sdb2 is backed by spinning rust. On a good day, it might get 90-100MB/s.

The swap partition currently at /dev/loop16 is video ram. I have an Nvidia Geforce GTX 1060 6GB card. Even running 3d games under WINE, I never saw more than a few GB used as reported by nvidia-smi. Here's my current (no 3d games) output:

Code:
# nvidia-smi
Tue Dec 10 18:35:20 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.135                Driver Version: 550.135        CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:01:00.0  On |                  N/A |
|  0%   47C    P8              6W /  120W |    2547MiB /   6144MiB |      8%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2306      G   /usr/bin/X                                     10MiB |
|    0   N/A  N/A      2414      G   /usr/bin/kwin_wayland                          38MiB |
|    0   N/A  N/A      2448      G   /usr/libexec/xdg-desktop-portal-kde             1MiB |
|    0   N/A  N/A      2483      G   /usr/bin/Xwayland                               9MiB |
|    0   N/A  N/A      2537      G   /usr/bin/kded6                                  1MiB |
|    0   N/A  N/A      2543      G   /usr/bin/ksmserver                              1MiB |
|    0   N/A  N/A      2549      G   ...c/polkit-kde-authentication-agent-1          1MiB |
|    0   N/A  N/A      2553      G   /usr/libexec/org_kde_powerdevil                 1MiB |
|    0   N/A  N/A      2556      G   /usr/bin/plasmashell                           35MiB |
|    0   N/A  N/A      2558      G   /usr/bin/kaccess                                1MiB |
|    0   N/A  N/A      2566      G   /usr/bin/kdeconnectd                            1MiB |
|    0   N/A  N/A      2602      G   /usr/libexec/kactivitymanagerd                  1MiB |
|    0   N/A  N/A      2605      G   /usr/bin/kmix                                   1MiB |
|    0   N/A  N/A      2606      G   /usr/bin/konsole                                1MiB |
|    0   N/A  N/A      2620      G   /usr/bin/dolphin                                1MiB |
|    0   N/A  N/A      2622      G   /usr/lib64/firefox/firefox                    145MiB |
|    0   N/A  N/A      2627      G   /usr/lib64/thunderbird/thunderbird             56MiB |
|    0   N/A  N/A      2668      G   /usr/bin/python3.12                             1MiB |
|    0   N/A  N/A      2670      G   /usr/bin/kclockd                                1MiB |
|    0   N/A  N/A      2951      G   /usr/bin/kded5                                  1MiB |
|    0   N/A  N/A      3427      C   vramfs                                       2110MiB |
|    0   N/A  N/A      6559      G   /usr/bin/audacious                              1MiB |
|    0   N/A  N/A     12941      G   ...er/plugins --ozone-platform=wayland          1MiB |
|    0   N/A  N/A     15745      G   /opt/vivaldi/vivaldi-bin                        1MiB |
|    0   N/A  N/A     15810      G   ...yeDropper --variations-seed-version         38MiB |
+-----------------------------------------------------------------------------------------+


So, lots of room.

I found an overlay called "myrvolay" available in the standard "eselect" options. I enabled that. Then I installed a program from this overlay called sys-fs/vramfs. I had to emerge it with the "no dependencies" option to get it to build (if you aren't comfortable with unusual or unstable software, don't do any of this).

It kept wanting to depend on "dev-libs/clhpp" but there is no such package. However, a working package does exist and is called "dev-cpp/clhpp" and does work. Hence, the "no dependencies" emerge option to get it to work, after manually installing "dev-cpp/clhpp" package.

This program uses FUSE and OpenCL to reserve a portion of video RAM as a block device. It works with proprietary nvidia drivers. You can then use that device as swap space, or a RAM disk, or whatever you like. I wanted a faster swap. It was a sparse file so I had to create a loop device atop it so that "swapon" didn't complain about a file with holes.

I get around 2.5 GB/s transfer rate from this, which is far better than the 90-100MB/s of the swap partition backed by an HDD spinning rust.

But, if the vramfs process that's holding the video RAM itself, should get swapped out, you get a hard system lock, a perfect catch-22. The thing holding the swap space got swapped, so there's nothing to unswap it so that it can access anything else. That was a reset switch press. So, I placed it into its own cgroup and set it up to never be swapped. This has boosted my system performance. It's like a free RAM upgrade. I wrote a script and placed it into /etc/local.d/. This is that script:
Code:

#!/bin/bash

#Treat unset variables as errors
set -u

#Kill any previous process and deactivate all swap so this command can be re-run if needed
killall vramfs &> /dev/null
swapoff /dev/loop* &> /dev/null
swapoff /tmp/vram/swapfile &> /dev/null
swapoff -a &> /dev/null
#Activate disk (HDD) swap as specified in /etc/fstab
swapon -a &> /dev/null

#Make sure we're starting from scratch
rm -f /tmp/vram/swapfile &> /dev/null
losetup -d /tmp/vram/swapfile &> /dev/null

#Setup a swappable area allocated in VRAM
#This "2G" figure is decimal GB, which is 1970MiB, not 2048 MiB (2GiB), as confirmed by nvidia-smi
exec vramfs /tmp/vram 2G &> /dev/null &

#Use Linux cgroups to make sure that the vramfs process doesn't, itself, get swapped out
#because that would create a deadlock and likely freeze the system

cgcreate -g memory:vramfs
cgset -r memory.swap.max=0 vramfs
VRAMFS_PID=$(pgrep vramfs)
cgclassify -g memory:vramfs ${VRAMFS_PID}

#Create a swapfile in the space
#dd is a slower but more portable way to create a swapfile with no holes, as needed by swapon
#(instead of using "dd  if=/dev/zero of=/tmp/vram/swapfile bs=1MiB count=$((2*1024))"

#Use truncate to create a sparse swap file for efficiency, and the loop device to present it to
#swapon as though it has no holes

#Find the next available kernel loop device
LOOPDEV=$(losetup -f)
#Create a swap file slightly smaller than the allocated vram -- has to be smaller than the allocated vramfs
truncate -s 1969MiB /tmp/vram/swapfile &> /dev/null || { echo "Error creating vram swap file"
        exit 1
}
#Set recommended permissions
chmod 0600 /tmp/vram/swapfile
#Activate the loop device
losetup ${LOOPDEV} /tmp/vram/swapfile &> /dev/null || { echo "Error setting up vram loop device"
        exit 1
}

#Create swap on the loop device
mkswap ${LOOPDEV} &> /dev/null
#Activate swap -- use a higher (arbitrary) priority of 32, as disk swap defaults to -2
swapon -p 32 ${LOOPDEV} &> /dev/null && echo Activating vram swwap || echo "Error enabling vram swap space"
        exit 1
}

echo Allocated \~2GB of vram swap space


Moved from "Unsupported Software" to "Documentation, Tips & Tricks". -- Zucca
Back to top
View user's profile Send private message
mattst88
Developer
Developer


Joined: 28 Oct 2004
Posts: 423

PostPosted: Fri Jan 24, 2025 6:04 pm    Post subject: Re: Fast Swap in unused VRAM Reply with quote

causality wrote:
But, if the vramfs process that's holding the video RAM itself, should get swapped out, you get a hard system lock, a perfect catch-22. The thing holding the swap space got swapped, so there's nothing to unswap it so that it can access anything else.


FWIW, it looks like this issue was fixed by this commit: https://github.com/Overv/vramfs/commit/829b1f2c259da2eb63ed3d4ddef0eeddb08b99e4
_________________
My Wiki page
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum