Gentoo Forums :: View topic - AMD-GPU, ROCm and vLLM

AMD-GPU, ROCm and vLLM

View unanswered posts
View posts from last 24 hours

Gentoo Forums Forum Index

Kernel & Hardware

View previous topic :: View next topic

Author

Message

rsnfunky
n00b
n00b

Joined: 30 Dec 2007
Posts: 66

Posted: Wed Sep 25, 2024 1:39 am Post subject: AMD-GPU, ROCm and vLLM

I am trying to install vLLM on my AMD PC as an inference server.

Have followed the gentoo ROCm guide : https://wiki.gentoo.org/wiki/ROCm and installed the full packages.

Step 1:
Installed ROCm and all libraries + packages mentioned in https://wiki.gentoo.org/wiki/ROCm

Step 2: Install Miniconda and create environment vllm:

Code:

wget https://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash ./Miniconda3-py310_23.1.0-1-Linux-x86_64.sh -b -p $HOME/miniconda3

# Manually adding Conda init to .bashrc
echo '### Conda init ###' >> $HOME/.bashrc
echo 'source $HOME/miniconda3/etc/profile.d/conda.sh' >> $HOME/.bashrc
echo 'conda activate' >> $HOME/.bashrc
source $HOME/.bashrc

Code:

conda create -n vllm -y
conda activate vllm
conda install python=3.11 -y

When trying to compile vLLM, I get the following error: No ROCm runtime is found, using ROCM_HOME='/usr'

Step 3: This could be a pytorch issue (though it was emerged using rocm USE flag)

In the vllm conda environment did: (found from official AMD documents)

Code:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.1/

Error during vllm compilation:

Quote:

running build_ext
-- The CXX compiler identification is GNU 13.3.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type: RelWithDebInfo
-- Target device: rocm
-- Found Python: /home/rohit/miniconda3/envs/vllm/bin/python3 (found version "3.11.9") found components: Interpreter Development.Module Development.SABIModule
-- Found python matching: /home/rohit/miniconda3/envs/vllm/bin/python3.
-- Found Torch: /usr/lib64/libtorch.so
-- Enabling core extension.
CMake Error at CMakeLists.txt:143 (message):
Can't find CUDA or HIP installation.

-- Configuring incomplete, errors occurred!

The system is unable to find HIP installation. Have checked the HIP is properly installed and the test script is also fucntioning.

My environment and HIP config:

Quote:

Collecting environment information...
WARNING 09-25 07:53:02 _custom_ops.py:18] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")

PyTorch version: 2.6.0.dev20240924+rocm6.1
Is debug build: False
CUDA used to build PyTorch: N/A
ROCM used to build PyTorch: 6.1.40091-a8dbc0c19

OS: Gentoo Linux (x86_64)
GCC version: (Gentoo 13.3.1_p20240614 p17) 13.3.1 20240614
Clang version: 18.1.8
CMake version: version 3.30.2
Libc version: glibc-2.39

Python version: 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.6.38-gentoo-gentoo-dist-x86_64-AMD_Ryzen_7_8700G_w-_Radeon_780M_Graphics-with-glibc2.39
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: AMD Radeon Graphics (gfx1103)
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: 6.1.40091
MIOpen runtime version: 3.1.0
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 7 8700G w/ Radeon 780M Graphics
CPU family: 25
Model: 117
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 40%
CPU max MHz: 6127.0000
CPU min MHz: 400.0000
BogoMIPS: 8403.66
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
Virtualization: AMD-V
L1d cache: 256 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 8 MiB (8 instances)
L3 cache: 16 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.1.3.1
[pip3] nvidia-cuda-cupti-cu12==12.1.105
[pip3] nvidia-cuda-nvrtc-cu12==12.1.105
[pip3] nvidia-cuda-runtime-cu12==12.1.105
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.0.2.54
[pip3] nvidia-curand-cu12==10.3.2.106
[pip3] nvidia-cusolver-cu12==11.4.5.107
[pip3] nvidia-cusparse-cu12==12.1.0.106
[pip3] nvidia-nccl-cu12==2.20.5
[pip3] nvidia-nvjitlink-cu12==12.6.68
[pip3] nvidia-nvtx-cu12==12.1.105
[pip3] pytorch-triton-rocm==3.1.0+5fe38ffd73
[pip3] pyzmq==26.2.0
[pip3] torch==2.6.0.dev20240924+rocm6.1
[pip3] torchaudio==2.5.0.dev20240924+rocm6.1
[pip3] torchvision==0.20.0.dev20240924+rocm6.1
[pip3] transformers==4.44.2
[pip3] triton==3.0.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.6.68 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
[conda] pytorch-triton-rocm 3.1.0+5fe38ffd73 pypi_0 pypi
[conda] pyzmq 26.2.0 pypi_0 pypi
[conda] torch 2.6.0.dev20240924+rocm6.1 pypi_0 pypi
[conda] torchaudio 2.5.0.dev20240924+rocm6.1 pypi_0 pypi
[conda] torchvision 0.20.0.dev20240924+rocm6.1 pypi_0 pypi
[conda] transformers 4.44.2 pypi_0 pypi
[conda] triton 3.0.0 pypi_0 pypi
ROCM Version: 6.1.40093-
Neuron SDK Version: N/A
vLLM Version: N/A
vLLM Build Flags:
CUDA Archs: Not Set; ROCm: Disabled; Neuron: Disabled
GPU Topology:
============================ ROCm System Management Interface ============================
================================ Weight between two GPUs =================================
GPU0
GPU0 0

================================= Hops between two GPUs ==================================
GPU0
GPU0 0

=============================== Link Type between two GPUs ===============================
GPU0
GPU0 0

======================================= Numa Nodes =======================================
GPU[0] : (Topology) Numa Node: 0
GPU[0] : (Topology) Numa Affinity: -1
================================== End of ROCm SMI Log ===================================

Output of hipconfig --full:

HIP version : 6.1.40093-

== hipconfig
HIP_PATH : /usr
ROCM_PATH : /usr
HIP_COMPILER : clang
HIP_PLATFORM : amd
HIP_RUNTIME : rocclr
CPP_CONFIG : -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/usr/include -I/usr/lib/llvm/18/bin/../../../../lib/clang/18

== hip-clang
HIP_CLANG_PATH : /usr/lib/llvm/18/bin
clang version 18.1.8
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/lib/llvm/18/bin
Configuration file: /etc/clang/x86_64-pc-linux-gnu-clang++.cfg
LLVM (http://llvm.org/):
LLVM version 18.1.8
Optimized build.
Default target: x86_64-pc-linux-gnu
Host CPU: znver4

Registered Targets:
aarch64 - AArch64 (little endian)
aarch64_32 - AArch64 (little endian ILP32)
aarch64_be - AArch64 (big endian)
amdgcn - AMD GCN GPUs
arm - ARM
arm64 - ARM64 (little endian)
arm64_32 - ARM64 (little endian ILP32)
armeb - ARM (big endian)
avr - Atmel AVR Microcontroller
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
hexagon - Hexagon
lanai - Lanai
loongarch32 - 32-bit LoongArch
loongarch64 - 64-bit LoongArch
mips - MIPS (32-bit big endian)
mips64 - MIPS (64-bit big endian)
mips64el - MIPS (64-bit little endian)
mipsel - MIPS (32-bit little endian)
msp430 - MSP430 [experimental]
nvptx - NVIDIA PTX 32-bit
nvptx64 - NVIDIA PTX 64-bit
ppc32 - PowerPC 32
ppc32le - PowerPC 32 LE
ppc64 - PowerPC 64
ppc64le - PowerPC 64 LE
r600 - AMD GPUs HD2XXX-HD6XXX
riscv32 - 32-bit RISC-V
riscv64 - 64-bit RISC-V
sparc - Sparc
sparcel - Sparc LE
sparcv9 - Sparc V9
systemz - SystemZ
thumb - Thumb
thumbeb - Thumb (big endian)
ve - VE
wasm32 - WebAssembly 32-bit
wasm64 - WebAssembly 64-bit
x86 - 32-bit X86: Pentium-Pro and above
x86-64 - 64-bit X86: EM64T and AMD64
xcore - XCore
hip-clang-cxxflags : --hip-version=6.1.40093 -O3
hip-clang-ldflags : --driver-mode=g++ --hip-version=6.1.40093 -O3 --hip-link --rtlib=compiler-rt -unwindlib=libgcc

Display posts from previous:

	Gentoo Forums Forum Index Kernel & Hardware	All times are GMT
Page 1 of 1

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Copyright 2001-2024 Gentoo Foundation, Inc. Designed by Kyle Manna © 2003; Style derived from original subSilver theme. | Hosting by Gossamer Threads Inc. © | Powered by phpBB 2.0.23-gentoo-p11 © 2001, 2002 phpBB Group
Privacy Policy