Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
[SOLVED] SIGILL install from binary Haswell -> SandyBridge
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
DeIM
Guru
Guru


Joined: 11 Apr 2006
Posts: 442

PostPosted: Sun Aug 25, 2024 6:20 am    Post subject: SIGILL install from binary package Haswell -> SandyBridge Reply with quote

I configured binary host (Haswell) with makefile to taget (SandyBridge):
Code:
COMMON_FLAGS="-march=sandybridge -mmmx -mpopcnt -msse -msse2 -msse3 -mssse3 -msse4.1 -msse4.2 -mpclmul -mcx16 -mfxsr -msahf -mxsave -mxsaveopt -mtune=sandybridge -fcf-protection -O2 -pipe -fomit-frame-pointer"
FLTO="-flto=thin"

CFLAGS="${FLTO} ${COMMON_FLAGS}"
CXXFLAGS="${FLTO} ${COMMON_FLAGS}"
FCFLAGS="${FLTO} ${COMMON_FLAGS}"
FFLAGS="${FLTO} ${COMMON_FLAGS}"

RUSTFLAGS="-C target-cpu=sandybridge -C opt-level=3 -C strip=symbols"

LDFLAGS="-Wl,-O2,--as-needed ${CFLAGS}"


I built some packages and installed on taret and I'm getting SIGILL:
Code:
[ 3100.464593] traps: dbus-uuidgen[12877] trap invalid opcode ip:7f9e5a783c64 sp:7ffdacd3efa8 error:0 in libdbus-1.so.3.38.0[2fc64,7f9e5a771000+44000]
[ 4226.250983] traps: emerge[15824] trap invalid opcode ip:7f1e4f641e94 sp:7ffe16eb0338 error:0 in libb2.so.1.0.4[2e94,7f1e4f640000+8000]
[ 4256.975920] traps: eix[15841] trap invalid opcode ip:556c666de9c4 sp:7ffc2a6e7868 error:0 in eix[699c4,556c666c5000+10f000]
[ 9538.886841] traps: eix-update[16030] trap invalid opcode ip:5639882869c4 sp:7ffe2b487d58 error:0 in eix[699c4,56398826d000+10f000]
[ 9558.325394] traps: env-update[16041] trap invalid opcode ip:7f6c0a04de94 sp:7ffeb8a9c858 error:0 in libb2.so.1.0.4[2e94,7f6c0a04c000+8000]


libb2 was in installed packages.

Code:
# env-update
Nedovolená instrukce (SIGILL) (core dumped [obraz paměti uložen])
# eix mold
Nedovolená instrukce (SIGILL) (core dumped [obraz paměti uložen])


Last edited by DeIM on Fri Aug 30, 2024 7:11 am; edited 4 times in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22619

PostPosted: Sun Aug 25, 2024 1:48 pm    Post subject: Reply with quote

Although an incorrect -march is a common cause of SIGILL, there are other ways to get SIGILL. What specific instruction is invalid in the failing binary?
Back to top
View user's profile Send private message
DeIM
Guru
Guru


Joined: 11 Apr 2006
Posts: 442

PostPosted: Mon Aug 26, 2024 10:45 am    Post subject: Reply with quote

Thanks for reply. Is there some tutorial or instructions how to get what instruction is failing?
(never done this before and ddg doesn't give me any good search results)
Back to top
View user's profile Send private message
DeIM
Guru
Guru


Joined: 11 Apr 2006
Posts: 442

PostPosted: Tue Aug 27, 2024 11:06 am    Post subject: Reply with quote

https://wiki.gentoo.org/wiki/Debugging

So use gdb?
Could strace print what instruction failing?

Thanks in advace :-)
Back to top
View user's profile Send private message
DeIM
Guru
Guru


Joined: 11 Apr 2006
Posts: 442

PostPosted: Tue Aug 27, 2024 11:22 am    Post subject: Reply with quote

https://wiki.gentoo.org/wiki/GDB#Enabling_core_dumps
OK gonna try this (since it's systemd - based installation of Gentoo):
https://www.man7.org/linux/man-pages/man1/coredumpctl.1.html
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22619

PostPosted: Tue Aug 27, 2024 11:56 am    Post subject: Reply with quote

Yes, use gdb, then disassemble and bt when the fault happens, and post the results here. You may not need debug symbols for this, but if you have them, it might make this easier, particularly if we need to determine which file has the bad instruction.

I don't think strace will give you enough detail here, but you could try that if you want. It's harmless to try and fail.
Back to top
View user's profile Send private message
DeIM
Guru
Guru


Joined: 11 Apr 2006
Posts: 442

PostPosted: Wed Aug 28, 2024 6:47 am    Post subject: Reply with quote

Yesterday I got clue about what could be source of problem.

1) I extracted stage 3 and copied libb2.so and python, removing cffi so get emerge working
2) I installed gdb
3) today I installed app-misc/resolve-march-native to resolve my clue and from it I get target CPU is subset of sandybridge - not supporting AVX

4) I've tried to see what I get anyway:
Code:
coredumpctl debug COREDUMP_EXE=/usr/bin/eix

Code:
# coredumpctl debug COREDUMP_EXE=/usr/bin/eix
           PID: 16030 (eix-update)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 4 (ILL)
     Timestamp: Sun 2024-08-25 00:26:24 CEST (3 days ago)
  Command Line: eix-update
    Executable: /usr/bin/eix
 Control Group: /user.slice/user-0.slice/session-1.scope
          Unit: session-1.scope
         Slice: user-0.slice
       Session: 1
     Owner UID: 0 (root)
       Boot ID: 9ecc77553f794139a6d98aea7d903f48
    Machine ID: c860d2e4419a4e40bd40f24f640f80b5
      Hostname: rogue
       Storage: /var/lib/systemd/coredump/core.eix-update.0.9ecc77553f794139a6d98aea7d903f48.16030.1724538384000000.zst (present)
  Size on Disk: 95.2K
       Message: Process 16030 (eix-update) of user 0 dumped core.

GNU gdb (Gentoo 14.2 vanilla) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/eix...
(No debugging symbols found in /usr/bin/eix)

warning: core file may not match specified executable file.
[New LWP 16030]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib64/libthread_db.so.1".
Core was generated by `eix-update'.
Program terminated with signal SIGILL, Illegal instruction.
#0  0x00005639882869c4 in ?? ()
(gdb)


disassemble command sugests me 200 lines to disassemble, but I guess the problem is in libthread_db library so I don't know how to switch to it in gdb.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22619

PostPosted: Wed Aug 28, 2024 12:16 pm    Post subject: Reply with quote

Normally, the faulting instruction should be at $pc, so x/4i $pc might be sufficient.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9823
Location: almost Mile High in the USA

PostPosted: Wed Aug 28, 2024 1:53 pm    Post subject: Reply with quote

haswell supports avx2, sandybridge is avx (1) only...

my sandybridge: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts

my haswell: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm arat pln pts
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
DeIM
Guru
Guru


Joined: 11 Apr 2006
Posts: 442

PostPosted: Wed Aug 28, 2024 2:09 pm    Post subject: Reply with quote

Code:
(gdb) x/4i $pc
=> 0x5639882869c4:      vxorps %xmm0,%xmm0,%xmm0
   0x5639882869c8:      vmovups %xmm0,0xf8130(%rip)        # 0x56398837eb00
   0x5639882869d0:      movq   $0x0,0xf8135(%rip)        # 0x56398837eb10
   0x5639882869db:      lea    0xf811e(%rip),%rsi        # 0x56398837eb00


Code:
# resolve-march-native
-march=sandybridge -mno-avx --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=2048


Code:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadlin e_timer xsave lahf_lm epb xsaveopt dtherm arat pln pts
Back to top
View user's profile Send private message
bstaletic
Guru
Guru


Joined: 05 Apr 2014
Posts: 363

PostPosted: Wed Aug 28, 2024 2:20 pm    Post subject: Reply with quote

DeIM wrote:
Code:
   
(gdb) x/4i $pc
=> 0x5639882869c4:      vxorps %xmm0,%xmm0,%xmm0
   0x5639882869c8:      vmovups %xmm0,0xf8130(%rip)        # 0x56398837eb00

Those two instructions are AVX instructions and your CPU does not seem to have those.

What does emerge --info eix say? I'm suspecting that there's more to your CFLAGS than you've shared.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22619

PostPosted: Wed Aug 28, 2024 2:33 pm    Post subject: Reply with quote

It could also be that OP has not rebuilt all requisite dependent packages with the more compatible CFLAGS. For example, if the compiler is pulling in some previously built static library or freestanding .o file, and that older object was built with the avx2 CFLAGS, then the resulting programs could be broken despite the CFLAGS being correct now. Identifying the file and function that contains the avx2 usage might help direct the investigation.
Back to top
View user's profile Send private message
bstaletic
Guru
Guru


Joined: 05 Apr 2014
Posts: 363

PostPosted: Wed Aug 28, 2024 3:08 pm    Post subject: Reply with quote

I'd also like to see cat /proc/cpuinfo. I feel like DeIM's CPU is older than SandyBridge. Evidence:

  • The illegal instruction is AVX, not AVX2.
  • -march=sandybridge implies AVX and was in CFLAGS.
  • app-misc/resolve-march-native also said -fno-avx


Assuming portage still works, perhaps emerge -e @world would be in order.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9823
Location: almost Mile High in the USA

PostPosted: Wed Aug 28, 2024 5:58 pm    Post subject: Reply with quote

According to wikipedia, seems that Sandybridge Celerons and Pentiums do not have AVX (1)... so it looks like that this is the root of the problem.
Looks like -march westmere (since this architecture does not have avx) or something needs to explicitly shut off avx in cflags (-fno-avx ?)...

ARK seems to imply even Ivybridge celerons don't have avx.

So basically only E3/E5/E7 i3/i5/i7 Sandy/Ivybridge have AVX, and none of the others.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
DeIM
Guru
Guru


Joined: 11 Apr 2006
Posts: 442

PostPosted: Fri Aug 30, 2024 7:10 am    Post subject: Reply with quote

Correct, solved.
It's Celeron G530 SandyBridge but no AVX at all.

Thanks for investigations :-)
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9823
Location: almost Mile High in the USA

PostPosted: Fri Aug 30, 2024 11:40 pm    Post subject: Reply with quote

Sometimes I wonder, how much of a speedup does one get using vxorps %xmm0,%xmm0,%xmm0 to clear %xmm0 ... or even use %xmm0 and other avx/sse registers instead of using a regular register... is the spill fill that noticeable that we should use AVX instructions?

I'm still using base x86_64 instructions on my sandybridge, so that my yorkfield, prescott, silvermont, llano, regor, bloomfield, ivybridge, haswell, and skylake can all use the same binaries...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum