View previous topic :: View next topic |
Author |
Message |
DeIM Guru
Joined: 11 Apr 2006 Posts: 442
|
Posted: Sun Aug 25, 2024 6:20 am Post subject: SIGILL install from binary package Haswell -> SandyBridge |
|
|
I configured binary host (Haswell) with makefile to taget (SandyBridge):
Code: | COMMON_FLAGS="-march=sandybridge -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpclmul -mcx16 -mfxsr -msahf -mxsave -mxsaveopt -mtune=sandybridge -fcf-protection -O2 -pipe -fomit-frame-pointer"
FLTO="-flto=thin"
CFLAGS="${FLTO} ${COMMON_FLAGS}"
CXXFLAGS="${FLTO} ${COMMON_FLAGS}"
FCFLAGS="${FLTO} ${COMMON_FLAGS}"
FFLAGS="${FLTO} ${COMMON_FLAGS}"
RUSTFLAGS="-C target-cpu=sandybridge -C opt-level=3 -C strip=symbols"
LDFLAGS="-Wl,-O2,--as-needed ${CFLAGS}" |
I built some packages and installed on taret and I'm getting SIGILL:
Code: | [ 3100.464593] traps: dbus-uuidgen[12877] trap invalid opcode ip:7f9e5a783c64 sp:7ffdacd3efa8 error:0 in libdbus-1.so.3.38.0[2fc64,7f9e5a771000+44000]
[ 4226.250983] traps: emerge[15824] trap invalid opcode ip:7f1e4f641e94 sp:7ffe16eb0338 error:0 in libb2.so.1.0.4[2e94,7f1e4f640000+8000]
[ 4256.975920] traps: eix[15841] trap invalid opcode ip:556c666de9c4 sp:7ffc2a6e7868 error:0 in eix[699c4,556c666c5000+10f000]
[ 9538.886841] traps: eix-update[16030] trap invalid opcode ip:5639882869c4 sp:7ffe2b487d58 error:0 in eix[699c4,56398826d000+10f000]
[ 9558.325394] traps: env-update[16041] trap invalid opcode ip:7f6c0a04de94 sp:7ffeb8a9c858 error:0 in libb2.so.1.0.4[2e94,7f6c0a04c000+8000] |
libb2 was in installed packages.
Code: | # env-update
Nedovolená instrukce (SIGILL) (core dumped [obraz paměti uložen])
# eix mold
Nedovolená instrukce (SIGILL) (core dumped [obraz paměti uložen]) |
Last edited by DeIM on Fri Aug 30, 2024 7:11 am; edited 4 times in total |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22608
|
Posted: Sun Aug 25, 2024 1:48 pm Post subject: |
|
|
Although an incorrect -march is a common cause of SIGILL, there are other ways to get SIGILL. What specific instruction is invalid in the failing binary? |
|
Back to top |
|
|
DeIM Guru
Joined: 11 Apr 2006 Posts: 442
|
Posted: Mon Aug 26, 2024 10:45 am Post subject: |
|
|
Thanks for reply. Is there some tutorial or instructions how to get what instruction is failing?
(never done this before and ddg doesn't give me any good search results) |
|
Back to top |
|
|
DeIM Guru
Joined: 11 Apr 2006 Posts: 442
|
|
Back to top |
|
|
DeIM Guru
Joined: 11 Apr 2006 Posts: 442
|
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22608
|
Posted: Tue Aug 27, 2024 11:56 am Post subject: |
|
|
Yes, use gdb, then disassemble and bt when the fault happens, and post the results here. You may not need debug symbols for this, but if you have them, it might make this easier, particularly if we need to determine which file has the bad instruction.
I don't think strace will give you enough detail here, but you could try that if you want. It's harmless to try and fail. |
|
Back to top |
|
|
DeIM Guru
Joined: 11 Apr 2006 Posts: 442
|
Posted: Wed Aug 28, 2024 6:47 am Post subject: |
|
|
Yesterday I got clue about what could be source of problem.
1) I extracted stage 3 and copied libb2.so and python, removing cffi so get emerge working
2) I installed gdb
3) today I installed app-misc/resolve-march-native to resolve my clue and from it I get target CPU is subset of sandybridge - not supporting AVX
4) I've tried to see what I get anyway:
Code: | coredumpctl debug COREDUMP_EXE=/usr/bin/eix |
Code: | # coredumpctl debug COREDUMP_EXE=/usr/bin/eix
PID: 16030 (eix-update)
UID: 0 (root)
GID: 0 (root)
Signal: 4 (ILL)
Timestamp: Sun 2024-08-25 00:26:24 CEST (3 days ago)
Command Line: eix-update
Executable: /usr/bin/eix
Control Group: /user.slice/user-0.slice/session-1.scope
Unit: session-1.scope
Slice: user-0.slice
Session: 1
Owner UID: 0 (root)
Boot ID: 9ecc77553f794139a6d98aea7d903f48
Machine ID: c860d2e4419a4e40bd40f24f640f80b5
Hostname: rogue
Storage: /var/lib/systemd/coredump/core.eix-update.0.9ecc77553f794139a6d98aea7d903f48.16030.1724538384000000.zst (present)
Size on Disk: 95.2K
Message: Process 16030 (eix-update) of user 0 dumped core.
GNU gdb (Gentoo 14.2 vanilla) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/eix...
(No debugging symbols found in /usr/bin/eix)
warning: core file may not match specified executable file.
[New LWP 16030]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `eix-update'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x00005639882869c4 in ?? ()
(gdb) |
disassemble command sugests me 200 lines to disassemble, but I guess the problem is in libthread_db library so I don't know how to switch to it in gdb. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22608
|
Posted: Wed Aug 28, 2024 12:16 pm Post subject: |
|
|
Normally, the faulting instruction should be at $pc, so x/4i $pc might be sufficient. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9819 Location: almost Mile High in the USA
|
Posted: Wed Aug 28, 2024 1:53 pm Post subject: |
|
|
haswell supports avx2, sandybridge is avx (1) only...
my sandybridge: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts
my haswell: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm arat pln pts _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
DeIM Guru
Joined: 11 Apr 2006 Posts: 442
|
Posted: Wed Aug 28, 2024 2:09 pm Post subject: |
|
|
Code: | (gdb) x/4i $pc
=> 0x5639882869c4: vxorps %xmm0,%xmm0,%xmm0
0x5639882869c8: vmovups %xmm0,0xf8130(%rip) # 0x56398837eb00
0x5639882869d0: movq $0x0,0xf8135(%rip) # 0x56398837eb10
0x5639882869db: lea 0xf811e(%rip),%rsi # 0x56398837eb00 |
Code: | # resolve-march-native
-march=sandybridge -mno-avx --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048 |
Code: | fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadlin e_timer xsave lahf_lm epb xsaveopt dtherm arat pln pts |
|
|
Back to top |
|
|
bstaletic Guru
Joined: 05 Apr 2014 Posts: 357
|
Posted: Wed Aug 28, 2024 2:20 pm Post subject: |
|
|
DeIM wrote: | Code: |
(gdb) x/4i $pc
=> 0x5639882869c4: vxorps %xmm0,%xmm0,%xmm0
0x5639882869c8: vmovups %xmm0,0xf8130(%rip) # 0x56398837eb00 |
|
Those two instructions are AVX instructions and your CPU does not seem to have those.
What does emerge --info eix say? I'm suspecting that there's more to your CFLAGS than you've shared. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22608
|
Posted: Wed Aug 28, 2024 2:33 pm Post subject: |
|
|
It could also be that OP has not rebuilt all requisite dependent packages with the more compatible CFLAGS. For example, if the compiler is pulling in some previously built static library or freestanding .o file, and that older object was built with the avx2 CFLAGS, then the resulting programs could be broken despite the CFLAGS being correct now. Identifying the file and function that contains the avx2 usage might help direct the investigation. |
|
Back to top |
|
|
bstaletic Guru
Joined: 05 Apr 2014 Posts: 357
|
Posted: Wed Aug 28, 2024 3:08 pm Post subject: |
|
|
I'd also like to see cat /proc/cpuinfo. I feel like DeIM's CPU is older than SandyBridge. Evidence:
- The illegal instruction is AVX, not AVX2.
- -march=sandybridge implies AVX and was in CFLAGS.
- app-misc/resolve-march-native also said -fno-avx
Assuming portage still works, perhaps emerge -e @world would be in order. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9819 Location: almost Mile High in the USA
|
Posted: Wed Aug 28, 2024 5:58 pm Post subject: |
|
|
According to wikipedia, seems that Sandybridge Celerons and Pentiums do not have AVX (1)... so it looks like that this is the root of the problem.
Looks like -march westmere (since this architecture does not have avx) or something needs to explicitly shut off avx in cflags (-fno-avx ?)...
ARK seems to imply even Ivybridge celerons don't have avx.
So basically only E3/E5/E7 i3/i5/i7 Sandy/Ivybridge have AVX, and none of the others. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
DeIM Guru
Joined: 11 Apr 2006 Posts: 442
|
Posted: Fri Aug 30, 2024 7:10 am Post subject: |
|
|
Correct, solved.
It's Celeron G530 SandyBridge but no AVX at all.
Thanks for investigations |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9819 Location: almost Mile High in the USA
|
Posted: Fri Aug 30, 2024 11:40 pm Post subject: |
|
|
Sometimes I wonder, how much of a speedup does one get using vxorps %xmm0,%xmm0,%xmm0 to clear %xmm0 ... or even use %xmm0 and other avx/sse registers instead of using a regular register... is the spill fill that noticeable that we should use AVX instructions?
I'm still using base x86_64 instructions on my sandybridge, so that my yorkfield, prescott, silvermont, llano, regor, bloomfield, ivybridge, haswell, and skylake can all use the same binaries... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
|