View previous topic :: View next topic |
Author |
Message |
niconiconi n00b
Joined: 15 Jul 2023 Posts: 1
|
Posted: Sat Jul 15, 2023 4:04 pm Post subject: Folding@Home crashes in libstdc++ during AMD ROCm/HSA call |
|
|
I'm trying to run sci-biology/foldingathome-7.6.21 on an AMD gfx906 ("AMD Radeon Pro VII") GPU using ROCm/HSA 5.4.3 in the official Gentoo repository. Unfortunately, FAHClient crashes as soon as it tries to invoke to detect and initialize the GPU via ROCm/HSA. Other OpenCL and HIP programs, such as clpeak, or my own SYCL program (targeting AMD HIP), runs on the same system without problems.
Code: |
user@gentoo /opt/foldingathome $ ./FAHClient
15:44:17:Read GPUs.txt
Segmentation fault (core dumped)
|
I recompiled dev-libs/rocr-runtime-5.4.3-r1, dev-libs/rocm-opencl-runtime, dev-libs/rocm-opencl-runtime with debug symbols and source code enabled, and gdb was able to generate the following backtrace:
Code: |
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6cd1821 in std::ctype<char>::_M_widen_init() const () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6
#2 0x00007ffff6d16168 in std::basic_ios<char, std::char_traits<char> >::fill() const () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6
#3 0x00007ffff702905d in std::basic_ios<char, std::char_traits<char> >::fill (__ch=48 '0', this=0x7fffffffbb70)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/basic_ios.h:390
#4 std::operator<< <char, std::char_traits<char> > (__f=..., __os=...) at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/iomanip:180
#5 rocr::AMD::GpuAgent::GetInfo (this=<optimized out>, attribute=<optimized out>, value=<optimized out>)
at /usr/src/debug/dev-libs/rocr-runtime-5.4.3-r1/ROCR-Runtime-rocm-5.4.3/src/core/runtime/amd_gpu_agent.cpp:1052
#6 0x00007ffff7039b18 in rocr::HSA::hsa_agent_get_info (agent_handle=..., attribute=40977, value=0x7fffffffbef0)
at /usr/src/debug/dev-libs/rocr-runtime-5.4.3-r1/ROCR-Runtime-rocm-5.4.3/src/core/runtime/hsa.cpp:562
#7 0x00007ffff7062623 in hsa_agent_get_info (agent=..., attribute=<optimized out>, value=<optimized out>)
at /usr/src/debug/dev-libs/rocr-runtime-5.4.3-r1/ROCR-Runtime-rocm-5.4.3/src/core/common/hsa_table_interface.cpp:116
#8 0x00007ffff73ed853 in roc::Device::populateOCLDeviceConstants (this=this@entry=0x1004180)
at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCclr-rocm-5.4.3/device/rocm/rocdevice.cpp:1092
#9 0x00007ffff73efc36 in roc::Device::create (this=this@entry=0x1004180)
at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCclr-rocm-5.4.3/device/rocm/rocdevice.cpp:698
#10 0x00007ffff73f27af in roc::Device::init () at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCclr-rocm-5.4.3/device/rocm/rocdevice.cpp:489
#11 0x00007ffff73b2c56 in amd::Device::init () at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCclr-rocm-5.4.3/device/device.cpp:461
#12 0x00007ffff73e4f2d in amd::Runtime::init () at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCclr-rocm-5.4.3/platform/runtime.cpp:75
#13 0x00007ffff7381476 in ShouldLoadPlatform () at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCm-OpenCL-Runtime-rocm-5.4.3/amdocl/cl_icd.cpp:224
#14 0x00007ffff73814fe in operator() (__closure=<optimized out>)
at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCm-OpenCL-Runtime-rocm-5.4.3/amdocl/cl_icd.cpp:274
#15 std::__invoke_impl<void, clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__f=...)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/invoke.h:61
#16 std::__invoke<clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> > (__fn=...)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/invoke.h:96
#17 operator() (__closure=<optimized out>) at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/mutex:852
#18 operator() (__closure=0x0) at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/mutex:788
#19 _FUN () at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/mutex:788
#20 0x00007ffff7d6a227 in ?? () from /lib64/libc.so.6
#21 0x00007ffff7381564 in __gthread_once (__func=<optimized out>, __once=0x7ffff74b8ae0 <clIcdGetPlatformIDsKHR::initOnce>)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/x86_64-pc-linux-gnu/bits/gthr-default.h:700
#22 std::call_once<clIcdGetPlatformIDsKHR(cl_uint, _cl_platform_id**, cl_uint*)::<lambda()> >(std::once_flag &, struct {...} &&) (__once=..., __f=...)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/mutex:859
#23 0x00007ffff7381649 in clIcdGetPlatformIDsKHR (num_entries=num_entries@entry=0, platforms=platforms@entry=0x0,
num_platforms=num_platforms@entry=0x7fffffffc41c)
at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCm-OpenCL-Runtime-rocm-5.4.3/amdocl/cl_icd.cpp:274
#24 0x00007ffff74c7591 in khrIcdVendorAdd (libraryName=0xe9ec50 "libamdocl64.so")
at /usr/src/debug/dev-libs/opencl-icd-loader-2023.04.17/OpenCL-ICD-Loader-2023.04.17/loader/icd.c:108
--Type <RET> for more, q to quit, c to continue without paging--
#25 0x00007ffff74cc164 in khrIcdOsDirEntryValidateAndAdd (d_name=<optimized out>, path=path@entry=0x7ffff74cd7b6 "/etc/OpenCL/vendors",
extension=extension@entry=0x7ffff74cd7a1 ".icd", addFunc=addFunc@entry=0x7ffff74c7396 <khrIcdVendorAdd>)
at /usr/src/debug/dev-libs/opencl-icd-loader-2023.04.17/OpenCL-ICD-Loader-2023.04.17/loader/linux/icd_linux.c:110
#26 0x00007ffff74cc40f in khrIcdOsDirEnumerate (path=path@entry=0x7ffff74cd7b6 "/etc/OpenCL/vendors", env=env@entry=0x7ffff74cd7a6 "OCL_ICD_VENDORS",
extension=extension@entry=0x7ffff74cd7a1 ".icd", addFunc=0x7ffff74c7396 <khrIcdVendorAdd>, bSort=bSort@entry=0)
at /usr/src/debug/dev-libs/opencl-icd-loader-2023.04.17/OpenCL-ICD-Loader-2023.04.17/loader/linux/icd_linux.c:202
#27 0x00007ffff74cc47e in khrIcdOsVendorsEnumerate ()
at /usr/src/debug/dev-libs/opencl-icd-loader-2023.04.17/OpenCL-ICD-Loader-2023.04.17/loader/linux/icd_linux.c:219
#28 0x00007ffff7d6a227 in ?? () from /lib64/libc.so.6
#29 0x00007ffff74cc4c6 in khrIcdOsVendorsEnumerateOnce ()
at /usr/src/debug/dev-libs/opencl-icd-loader-2023.04.17/OpenCL-ICD-Loader-2023.04.17/loader/linux/icd_linux.c:232
#30 0x00007ffff74c7391 in khrIcdInitialize () at /usr/src/debug/dev-libs/opencl-icd-loader-2023.04.17/OpenCL-ICD-Loader-2023.04.17/loader/icd.c:52
#31 0x00007ffff74c85d0 in clGetPlatformIDs (num_entries=0, platforms=0x0, num_platforms=0x7fffffffc700)
at /usr/src/debug/dev-libs/opencl-icd-loader-2023.04.17/OpenCL-ICD-Loader-2023.04.17/loader/icd_dispatch.c:214
#32 0x0000000000650423 in ?? ()
#33 0x0000000000482569 in ?? ()
#34 0x00000000004c272f in ?? ()
#35 0x000000000043efc0 in ?? ()
#36 0x0000000000510194 in ?? ()
#37 0x0000000000522fcc in ?? ()
#38 0x00000000004380d4 in ?? ()
#39 0x000000000042dcd7 in ?? ()
#40 0x00007ffff7d0468a in ?? () from /lib64/libc.so.6
#41 0x00007ffff7d04745 in __libc_start_main () from /lib64/libc.so.6
#42 0x000000000042d0f1 in ?? ()
#43 0x00007fffffffe188 in ?? ()
#44 0x0000000000000038 in ?? ()
#45 0x0000000000000001 in ?? ()
#46 0x00007fffffffe412 in ?? ()
#47 0x0000000000000000 in ?? ()
|
As one can see from the backtrace, FAHClient invokes ROCm/ROCr/HSA to query the GPU, but the shared library rocr-runtime crashes during this process. The offending lines is amd_gpu_agent.cpp:1052:
Code: |
1038 case HSA_AMD_AGENT_INFO_UUID: {
1039 uint64_t uuid_value = static_cast<uint64_t>(properties_.UniqueID);
1040
1041 // Either device does not support UUID e.g. a Gfx8 device,
1042 // or runtime is using an older thunk library that does not
1043 // support UUID's
1044 if (uuid_value == 0) {
1045 char uuid_tmp[] = "GPU-XX";
1046 snprintf((char*)value, sizeof(uuid_tmp), "%s", uuid_tmp);
1047 break;
1048 }
1049
1050 // Device supports UUID, build UUID string to return
1051 std::stringstream ss;
1052 ss << "GPU-" << std::setfill('0') << std::setw(sizeof(uint64_t) * 2) << std::hex
1053 << uuid_value;
1054 snprintf((char*)value, (ss.str().length() + 1), "%s", (char*)ss.str().c_str());
1055 break;
1056 }
|
It crashes while rocr-runtime tries to convert the GPU UUID into a string, at these two lines:
Code: |
1052 ss << "GPU-" << std::setfill('0') << std::setw(sizeof(uint64_t) * 2) << std::hex
1053 << uuid_value;
|
According to gdb, it crashes inside std::setfill('0'):
Code: |
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6cd1821 in std::ctype<char>::_M_widen_init() const () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6
#2 0x00007ffff6d16168 in std::basic_ios<char, std::char_traits<char> >::fill() const () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6
#3 0x00007ffff702905d in std::basic_ios<char, std::char_traits<char> >::fill (__ch=48 '0', this=0x7fffffffbb40)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/bits/basic_ios.h:390
|
If I debug frame #5, one can see that the GPU UUID is succesfully detected, the crash only occurs when ROCm tries to convert the uint64_t variable into a string:
Code: |
(gdb) frame 5
#5 rocr::AMD::GpuAgent::GetInfo (this=<optimized out>, attribute=<optimized out>, value=<optimized out>)
at /usr/src/debug/dev-libs/rocr-runtime-5.4.3-r1/ROCR-Runtime-rocm-5.4.3/src/core/runtime/amd_gpu_agent.cpp:1052
1052 ss << "GPU-" << std::setfill('0') << std::setw(sizeof(uint64_t) * 2) << std::hex
(gdb) p uuid_value
$1 = [RETRACTED]
|
This doesn't make any sense. The function std::setfill() is part of libstdc++, given the correct input, there's no reason for it to crash - I don't see anything incorrect here. I extracted the code to create a standalone test program. It runs correctly without crashing.
Code: |
user@gentoo /tmp $ cat uuid.cpp
#include <cstdio>
#include <sstream>
#include <iomanip>
int main(void)
{
std::uint64_t uuid_value = 42;
std::stringstream ss;
ss << "GPU-" << std::setfill('0') << std::setw(sizeof(uint64_t) * 2) << std::hex
<< uuid_value;
auto ss_str = ss.str();
printf("%s\n", ss_str.c_str());
return 0;
}
user@gentoo /tmp $ g++ uuid.cpp -o uuid -O3 -Wall -Wextra
user@gentoo /tmp $ ./uuid
GPU-000000000000002a
user@gentoo /tmp $
|
Although there should no reason for it to crash, for further troubleshooting I disabled this code path in ROCr anyway using the following patch:
Code: |
diff -upr ROCR-Runtime-rocm-5.4.3/src/core/runtime/amd_gpu_agent.cpp ROCR-Runtime-rocm-5.4.3.hack/src/core/runtime/amd_gpu_agent.cpp
--- src/core/runtime/amd_gpu_agent.cpp 2023-07-15 15:05:17.892993234 -0000
+++ src/core/runtime/amd_gpu_agent.cpp 2023-07-15 15:11:20.319434118 -0000
@@ -1041,7 +1041,7 @@ hsa_status_t GpuAgent::GetInfo(hsa_agent
// Either device does not support UUID e.g. a Gfx8 device,
// or runtime is using an older thunk library that does not
// support UUID's
- if (uuid_value == 0) {
+ if (true) {
char uuid_tmp[] = "GPU-XX";
snprintf((char*)value, sizeof(uuid_tmp), "%s", uuid_tmp);
break;
|
Unfortunately, re-running FAHClient, it now crashes at a different place.
Code: |
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff6cd1821 in std::ctype<char>::_M_widen_init() const () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6
#2 0x00007ffff6d34508 in std::basic_ostream<char, std::char_traits<char> >& std::basic_ostream<char, std::char_traits<char> >::_M_insert<unsigned long>(unsigned long) () from /usr/lib/gcc/x86_64-pc-linux-gnu/12/libstdc++.so.6
#3 0x00007ffff73ee43b in std::basic_ostream<char, std::char_traits<char> >::operator<< (__n=<optimized out>, this=<optimized out>)
at /usr/lib/gcc/x86_64-pc-linux-gnu/12/include/g++-v12/ostream:181
#4 roc::Device::populateOCLDeviceConstants (this=this@entry=0x1004240)
at /usr/src/debug/dev-libs/rocm-opencl-runtime-5.4.3-r1/ROCclr-rocm-5.4.3/device/rocm/rocdevice.cpp:1356
|
The offending code rocm/rocdevice.cpp:1356 also makes use of std::stringstream...
Code: |
1355 std::stringstream ss;
1356 ss << AMD_BUILD_STRING " (HSA" << major << "." << minor << "," << (settings().useLightning_ ? "LC" : "HSAIL");
1357 ss << ")";
|
So basically, the problem here is that when running Folding@Home, libstdc++ crashes immediately whenever std::stringstream is used by ROCm/ROCr/HSA.
The GCC C++ standard library is not going to just crash, especially in its heavily-used I/O subsystem. I'm running out of idea on identifying the root cause...
My best guesses are, it's either
1. a library conflict - given that FAHClient is distributed as a proprietary binary program, perhaps there's ABI incompatibility due to mixing different library/compiler versions between the upstream binary build and Gentoo. I already recompiled ROCm/HSA using GCC 12 instead of GCC 13 (I saw a mix of GCC 12/GCC 13 usage in libstdc++) but without success.
Or
2. Memory corruption. Perhaps a bug in FAHClient or ROCm already corrupted some critical data structure in memory earlier, so later uses of std::stringstream mysteriously fail.
Has any expert out there seen anything similar?
My emerge --info is:
Code: |
Portage 3.0.49 (python 3.11.4-final-0, default/linux/amd64/17.1/systemd, gcc-12, glibc-2.37-r3, 6.4.3-gentoo-dist x86_64)
=================================================================
System uname: Linux-6.4.3-gentoo-dist-x86_64-AMD_Ryzen_5_3500X_6-Core_Processor-with-glibc2.37
KiB Mem: 16289740 total, 14417284 free
KiB Swap: 0 total, 0 free
Timestamp of repository gentoo: Fri, 14 Jul 2023 23:30:01 +0000
Head commit of repository gentoo: 02547a7f1b78ba334e7ce79d2363ca13c241fea7
Timestamp of repository gentoo-zh: Fri, 14 Jul 2023 09:46:47 +0000
Head commit of repository gentoo-zh: 2f39f6fcedf69e1e54f0fdb9ed69223afd91448d
sh bash 5.2_p15-r6
ld GNU ld (Gentoo 2.39 p6) 2.39.0
app-misc/pax-utils: 1.3.7::gentoo
app-shells/bash: 5.2_p15-r6::gentoo
dev-lang/perl: 5.38.0::gentoo
dev-lang/python: 2.7.18_p16-r1::gentoo, 3.11.4::gentoo, 3.12.0_beta4::gentoo
dev-lang/rust: 1.70.0::gentoo
dev-util/cmake: 3.26.4-r1::gentoo
dev-util/meson: 1.1.1::gentoo
sys-apps/baselayout: 2.14::gentoo
sys-apps/sandbox: 2.36::gentoo
sys-apps/systemd: 253.6::gentoo
sys-devel/autoconf: 2.71-r6::gentoo
sys-devel/automake: 1.16.5-r1::gentoo
sys-devel/binutils: 2.39-r5::gentoo, 2.40-r5::gentoo
sys-devel/binutils-config: 5.5::gentoo
sys-devel/clang: 15.0.7-r3::gentoo, 16.0.6::gentoo
sys-devel/gcc: 12.2.1_p20230428-r1::gentoo
sys-devel/gcc-config: 2.11::gentoo
sys-devel/libtool: 2.4.7-r1::gentoo
sys-devel/lld: 15.0.7::gentoo, 16.0.6::gentoo
sys-devel/llvm: 15.0.7-r3::gentoo, 16.0.6::gentoo
sys-devel/make: 4.4.1-r1::gentoo
sys-kernel/linux-headers: 6.4::gentoo (virtual/os-headers)
sys-libs/glibc: 2.37-r3::gentoo
Repositories:
gentoo
location: /var/db/repos/gentoo
sync-type: rsync
sync-uri: rsync://rsync.gentoo.org/gentoo-portage
priority: -1000
volatile: False
sync-rsync-verify-metamanifest: yes
sync-rsync-verify-jobs: 1
sync-rsync-verify-max-age: 24
sync-rsync-extra-opts:
gentoo-zh
location: /var/db/repos/gentoo-zh
sync-type: git
sync-uri: https://github.com/gentoo-mirror/gentoo-zh.git
masters: gentoo
volatile: False
local
location: /var/db/repos/local
masters: gentoo
volatile: False
ACCEPT_KEYWORDS="amd64 ~amd64"
ACCEPT_LICENSE="@FREE"
CBUILD="x86_64-pc-linux-gnu"
CFLAGS="-O2 -pipe"
CHOST="x86_64-pc-linux-gnu"
CONFIG_PROTECT="/etc /usr/share/gnupg/qualified.txt"
CONFIG_PROTECT_MASK="/etc/ca-certificates.conf /etc/env.d /etc/fonts/fonts.conf /etc/gconf /etc/gentoo-release /etc/revdep-rebuild /etc/sandbox.d /etc/terminfo"
CXXFLAGS="-O2 -pipe"
DISTDIR="/var/cache/distfiles"
ENV_UNSET="CARGO_HOME DBUS_SESSION_BUS_ADDRESS DISPLAY GDK_PIXBUF_MODULE_FILE GOBIN GOPATH PERL5LIB PERL5OPT PERLPREFIX PERL_CORE PERL_MB_OPT PERL_MM_OPT XAUTHORITY XDG_CACHE_HOME XDG_CONFIG_HOME XDG_DATA_HOME XDG_RUNTIME_DIR XDG_STATE_HOME"
FCFLAGS="-O2 -pipe"
FEATURES="assume-digests binpkg-docompress binpkg-dostrip binpkg-logs binpkg-multi-instance buildpkg-live config-protect-if-modified distlocks ebuild-locks fixlafiles ipc-sandbox merge-sync multilib-strict network-sandbox news parallel-fetch pid-sandbox preserve-libs protect-owned qa-unresolved-soname-deps sandbox sfperms strict unknown-features-warn unmerge-logs unmerge-orphans userfetch userpriv usersandbox usersync xattr"
FFLAGS="-O2 -pipe"
GENTOO_MIRRORS="http://distfiles.gentoo.org"
LANG="C.UTF8"
LDFLAGS="-Wl,-O1 -Wl,--as-needed"
LEX="flex"
MAKEOPTS="-j6"
PKGDIR="/var/cache/binpkgs"
PORTAGE_CONFIGROOT="/"
PORTAGE_RSYNC_OPTS="--recursive --links --safe-links --perms --times --omit-dir-times --compress --force --whole-file --delete --stats --human-readable --timeout=180 --exclude=/distfiles --exclude=/local --exclude=/packages --exclude=/.git"
PORTAGE_TMPDIR="/var/tmp"
SHELL="/bin/bash"
USE="acl amd64 bzip2 cli crypt dri fortran gdbm iconv ipv6 libtirpc multilib ncurses nls nptl openmp pam pcre readline seccomp split-usr ssl systemd test-rust udev unicode xattr zlib" ABI_X86="64" ADA_TARGET="gnat_2021" AMDGPU_TARGETS="gfx803 gfx906" APACHE2_MODULES="authn_core authz_core socache_shmcb unixd actions alias auth_basic authn_alias authn_anon authn_dbm authn_default authn_file authz_dbm authz_default authz_groupfile authz_host authz_owner authz_user autoindex cache cgi cgid dav dav_fs dav_lock deflate dir disk_cache env expires ext_filter file_cache filter headers include info log_config logio mem_cache mime mime_magic negotiation rewrite setenvif speling status unique_id userdir usertrack vhost_alias" CALLIGRA_FEATURES="karbon sheets words" COLLECTD_PLUGINS="df interface irq load memory rrdtool swap syslog" CPU_FLAGS_X86="mmx mmxext sse sse2" ELIBC="glibc" GPSD_PROTOCOLS="ashtech aivdm earthmate evermore fv18 garmin garmintxt gpsclock greis isync itrax mtk3301 nmea ntrip navcom oceanserver oldstyle oncore rtcm104v2 rtcm104v3 sirf skytraq superstar2 timing tsip tripmate tnt ublox ubx" INPUT_DEVICES="libinput" KERNEL="linux" LCD_DEVICES="bayrad cfontz cfontz633 glk hd44780 lb216 lcdm001 mtxorb ncurses text" LIBREOFFICE_EXTENSIONS="presenter-console presenter-minimizer" LUA_SINGLE_TARGET="lua5-1" LUA_TARGETS="lua5-1" OFFICE_IMPLEMENTATION="libreoffice" PHP_TARGETS="php8-1" POSTGRES_TARGETS="postgres12 postgres13" PYTHON_SINGLE_TARGET="python3_11" PYTHON_TARGETS="python3_11" RUBY_TARGETS="ruby31" VIDEO_CARDS="amdgpu fbdev intel nouveau radeon radeonsi vesa dummy v4l" XTABLES_ADDONS="quota2 psd pknock lscan length2 ipv4options ipset ipp2p iface geoip fuzzy condition tee tarpit sysrq proto steal rawnat logmark ipmark dhcpmac delude chaos account"
Unset: ADDR2LINE, AR, ARFLAGS, AS, ASFLAGS, CC, CCLD, CONFIG_SHELL, CPP, CPPFLAGS, CTARGET, CXX, CXXFILT, ELFEDIT, EMERGE_DEFAULT_OPTS, EXTRA_ECONF, F77FLAGS, FC, GCOV, GPROF, INSTALL_MASK, LC_ALL, LD, LFLAGS, LIBTOOL, LINGUAS, MAKE, MAKEFLAGS, NM, OBJCOPY, OBJDUMP, PORTAGE_BINHOST, PORTAGE_BUNZIP2_COMMAND, PORTAGE_COMPRESS, PORTAGE_COMPRESS_FLAGS, PORTAGE_RSYNC_EXTRA_OPTS, RANLIB, READELF, RUSTFLAGS, SIZE, STRINGS, STRIP, YACC, YFLAGS
|
|
|
Back to top |
|
|
depontius Advocate
Joined: 05 May 2004 Posts: 3509
|
Posted: Wed Feb 14, 2024 12:57 am Post subject: |
|
|
Check what AMD says can be expected to work on your GPU. It appears to be a moving target. _________________ .sigs waste space and bandwidth |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22602
|
Posted: Wed Feb 14, 2024 1:34 am Post subject: |
|
|
Whether or not OP's GPU is supported, I agree with the original premise (from back in July) that it is not reasonable for libstdc++ to just crash here. As OP also notes, a test program does not crash, so it seems like an incompatibility. If memory corruption is at fault, running under Valgrind should report something. |
|
Back to top |
|
|
depontius Advocate
Joined: 05 May 2004 Posts: 3509
|
Posted: Wed Feb 14, 2024 2:47 pm Post subject: |
|
|
I agree, Hu. However if the GPU is not supported it's entirely possible that it's corrupting memory in its failed attempt to operate. Clearly OP is way beyond me in what he's doing. Part of my reaction here was outrage upon just finding how what the state of ROCm and hardware support looks like. _________________ .sigs waste space and bandwidth |
|
Back to top |
|
|
Ralphred l33t
Joined: 31 Dec 2013 Posts: 648
|
Posted: Wed Feb 14, 2024 3:31 pm Post subject: |
|
|
The GPU is supported by both his current version through to the latest release (6.0.2)
I might build a chart of gfx[xxxx] and at what version support for them starts/ends, checking all the release notes for compatibility is a chore. |
|
Back to top |
|
|
depontius Advocate
Joined: 05 May 2004 Posts: 3509
|
Posted: Wed Feb 14, 2024 4:04 pm Post subject: |
|
|
I would really like to see such a chart. I've also found some patches for adding Navi14 support, but haven't done anything about it yet.
I'm also wondering if we should be keeping back-level ROCm stuff in portage simply because it seems so hardware-dependent. Better packaging from AMD would be nice. I was looking at Cuda a bit, and there seems to be one package, but there are various levels of Cuda API support based on the hardware. _________________ .sigs waste space and bandwidth |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|