View previous topic :: View next topic |
Author |
Message |
kgdrenefort Apprentice
Joined: 19 Sep 2023 Posts: 216 Location: Somewhere in the 77
|
Posted: Wed Apr 17, 2024 3:16 pm Post subject: Binhost: Illegal instruction (LUA) (SandyBridge VS Znver1) |
|
|
Hello,
From this topic, I realized and confirmed something was very wrong with the configuration between my binhost machine (a nspawn within my main desktop) and my client (an old laptop).
Both are not using same CPU and so some settings was needed if I wanted to build packages that suits the client CPU spec.
This resulted in improper build of lua, making Awesome window manager failing to boot, despite X was working.
The solution was to rebuid, on the client, lua and make sure it was using the good slot for it. Now, Awesome works if I use startx command.
But I do need to prevent from happening again, of course. So I have, with your helps, to find out what I did wrong.
Some infos about the protagonist:
Client:
- Manufacturer and model: HP Elitebook 8560w
- CPU: Intel(R) Core(TM) i7-2820QM CPU @ 2.30GHz
- Output of /proc/cpuinfo: https://bpa.st/RN4A
- Family: SandyBridge
- Flags (from GCC documentation):
Code: | Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE and PCLMUL instruction set support. |
- Actual state of /etc/portage/make.conf:
(Please note that for how, I have disabled the reaching of binary package, I un-comment it for testing purpose only as it is not fixed yet !)
Code: | # These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /usr/share/portage/config/make.conf.example for a more
# detailed example.
COMMON_FLAGS="-march=native -O2 -pipe"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
MAKEOPTS="-j2 -l2"
USE="X grub acpi bash-completion branding cups curl colord dbus git gui hddtemp lm-sensors man ncurses networkmanager pcmcia pcre posix scanner spell systemd udev udisks unicode upower usb vim-syntax x264 -bluetooth -geoip -geolocation -gnome -gnome-keyring -gtk -gtk-doc -handbook -kde -plasma -qt5 -qt6 -semantic-desktop -telemetry -tk -wayland -webkit -wifi"
VIDEO_CARDS="nouveau"
L10N="en fr"
# NOTE: This stage was built with the bindist Use flag enabled
# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C.utf8
GENTOO_MIRRORS="https://mirrors.ircam.fr/pub/gentoo-distfiles/ \
https://gentoo.mirrors.ovh.net/gentoo-distfiles/ \
https://mirrors.soeasyto.com/distfiles.gentoo.org/"
ACCEPT_LICENSE="-* @FREE @BINARY-REDISTRIBUTABLE"
### Binary package settings (client) ###
#EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS} --getbinpkg"
#FEATURES="getbinpkg"
#EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS} --usepkg-exclude 'sys-kernel/gentoo-sources virtual/* www-servers/lighttpd'"
#PORTAGE_BINHOST="http://192.168.1.103:81/packages"
|
- Actual state of /etc/portage/package.use/00cpu-flags:
Code: | */* CPU_FLAGS_X86: aes avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3 |
- Output of cpuid2cpuflags (looks useless since it's just above, but in doubt if I forget something…):
Code: | CPU_FLAGS_X86: aes avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3 |
- Output of resolve-march-native:
Code: | -march=sandybridge -maes --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=8192 |
Binhost:
- Manufacturer and model: Home-mounted computer and from my selection of each hardware part, running a Gentoo into nspawn.
- CPU: AMD Ryzen 5 2600 Six-Core Processor
- Output of /proc/cpuinfo: https://bpa.st/QS2Q
- Family: znver1
- Flags (from GCC documentation):
Code: | AMD Family 17h core based CPUs with x86-64 instruction set support. (This supersets BMI, BMI2, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX, SHA, CLZERO, AES, PCLMUL, CX16, MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, and 64-bit instruction set extensions.)
|
- Actual state of /etc/portage/make.conf:
Code: | # These settings were set by the catalyst build script that automatically
# built this stage.
# Please consult /usr/share/portage/config/make.conf.example for a more
# detailed example.
COMMON_FLAGS="-march=x86-64-v2 -O2 -pipe -mavx -mavx256-split-unaligned-store -mpclmul -mxsave -mxsaveopt"
CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
MAKEOPTS="-j8 -l8"
USE="X grub acpi bash-completion branding cups curl colord dbus git gui hddtemp lm-sensors man ncurses networkmanager pcmcia pcre posix scanner spell systemd udev udisks unicode upower usb vim-syntax x264 -bluetooth -geoip -geolocation -gnome -gnome-keyring -gtk -gtk-doc -handbook -kde -plasma -qt5 -qt6 -semantic-desktop -telemetry -tk -wayland -webkit -wifi"
VIDEO_CARDS="nouveau"
L10N="en fr"
# NOTE: This stage was built with the bindist Use flag enabled
# This sets the language of build output to English.
# Please keep this setting intact when reporting bugs.
LC_MESSAGES=C.utf8
GENTOO_MIRRORS="https://mirrors.ircam.fr/pub/gentoo-distfiles/ \
https://gentoo.mirrors.ovh.net/gentoo-distfiles/ \
https://mirrors.soeasyto.com/distfiles.gentoo.org/"
ACCEPT_LICENSE="-* @FREE @BINARY-REDISTRIBUTABLE"
### Binary package setting (host) ###
BINPKG_FORMAT="gpkg"
FEATURES="buildpkg"
EMERGE_DEFAULT_OPTS="${EMERGE_DEFAULT_OPTS} --usepkg-exclude 'sys-kernel/gentoo-sources virtual/* www-servers/lighttpd'"
|
- Actual state of /etc/portage/package.use/00cpu-flags:
Code: | */* CPU_FLAGS_X86: aes avx mmx mmxext pclmul popcnt sse sse2 sse3 sse4_1 sse4_2 ssse3 |
- Output of cpuid2cpuflags:
Code: | CPU_FLAGS_X86: aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 |
- Output of resolve-march-native:
Code: | -march=znver1 --param=l1-cache-line-size=64 --param=l1-cache-size=32 --param=l2-cache-size=512 |
---
So far as I know, as I did for another laptop using the same desktop but in a different nspawn, I tried to setup the binhost to keep only what is known for the family SandyBridge. From beginning of this nspawn life's, it used the parameters sets for SandyBridge, not Znver1.
Since I'm new to the process, I'm being what I think is "lazy" by packaging everything that the binhost install (in case I would forgot something, making me think it's safer, this way). And when I wanted a binary, which the goal is to use 100% of the binhost's binaries on the client, I let the configuration into make.conf force to ask for them, then finally before it emerge them if they all are tagged binary, which was the case until the non-working LUA problem arise.
Since then, some packages were installed and rebuilt with the client's settings, -march=native could not being wrong I guess.
As writed above, the make.conf for the binhost use these settings:
Code: | COMMON_FLAGS="-march=x86-64-v2 -O2 -pipe -mavx -mavx256-split-unaligned-store -mpclmul -mxsave -mxsaveopt |
Which was, for me, all OK regarding what is suggesting the GCC page for sandybridge:
Code: | Intel Sandy Bridge CPU with 64-bit extensions, MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, POPCNT, CX16, SAHF, FXSR, AVX, XSAVE and PCLMUL instruction set support. |
As these flags are in the list above: mavx -mavx256-split-unaligned-store -mpclmul -mxsave -mxsaveopt or into the output of this tricks someone give me on #gentoo and reported into this topic:
Code: | satanBinhost ~ # diff -u0 <(flags x86-64-v2) <(flags znver1) | cat
--- /dev/fd/63 2024-04-17 17:02:47.127994844 +0200
+++ /dev/fd/62 2024-04-17 17:02:47.131328152 +0200
@@ -12 +12 @@
- -mabm [disabled]
+ -mabm [enabled]
@@ -15,2 +15,2 @@
- -madx [disabled]
- -maes [disabled]
+ -madx [enabled]
+ -maes [enabled]
@@ -29 +29 @@
- -march= x86-64-v2
+ -march= znver1
@@ -31,2 +31,2 @@
- -mavx [disabled]
- -mavx2 [disabled]
+ -mavx [enabled]
+ -mavx2 [enabled]
@@ -34 +34 @@
- -mavx256-split-unaligned-store [disabled]
+ -mavx256-split-unaligned-store [enabled]
@@ -58,2 +58,2 @@
- -mbmi [disabled]
- -mbmi2 [disabled]
+ -mbmi [enabled]
+ -mbmi2 [enabled]
@@ -65 +65 @@
- -mclflushopt [disabled]
+ -mclflushopt [enabled]
@@ -67 +67 @@
- -mclzero [disabled]
+ -mclzero [enabled]
@@ -78 +78 @@
- -mf16c [disabled]
+ -mf16c [enabled]
@@ -83 +83 @@
- -mfma [disabled]
+ -mfma [enabled]
@@ -89 +89 @@
- -mfsgsbase [disabled]
+ -mfsgsbase [enabled]
@@ -118 +118 @@
- -mlzcnt [disabled]
+ -mlzcnt [enabled]
@@ -124 +124 @@
- -mmovbe [disabled]
+ -mmovbe [enabled]
@@ -127 +127 @@
- -mmove-max= 128
+ -mmove-max= 256
@@ -132 +132 @@
- -mmwaitx [disabled]
+ -mmwaitx [enabled]
@@ -145 +145 @@
- -mpclmul [disabled]
+ -mpclmul [enabled]
@@ -151 +151 @@
- -mprefer-vector-width= none
+ -mprefer-vector-width= 128
@@ -155 +155 @@
- -mprfchw [disabled]
+ -mprfchw [enabled]
@@ -160,2 +160,2 @@
- -mrdrnd [disabled]
- -mrdseed [disabled]
+ -mrdrnd [enabled]
+ -mrdseed [enabled]
@@ -175 +175 @@
- -msha [disabled]
+ -msha [enabled]
@@ -186 +186 @@
- -msse4a [disabled]
+ -msse4a [enabled]
@@ -196 +196 @@
- -mstore-max= 128
+ -mstore-max= 256
@@ -204 +204 @@
- -mtune= generic
+ -mtune= znver1
@@ -218,4 +218,4 @@
- -mxsave [disabled]
- -mxsavec [disabled]
- -mxsaveopt [disabled]
- -mxsaves [disabled]
+ -mxsave [enabled]
+ -mxsavec [enabled]
+ -mxsaveopt [enabled]
+ -mxsaves [enabled]
|
Code: | satanBinhost ~ # diff -u0 <(flags x86-64-v2) <(flags sandybridge) | cat
--- /dev/fd/63 2024-04-17 17:02:14.401578937 +0200
+++ /dev/fd/62 2024-04-17 17:02:14.404912244 +0200
@@ -29 +29 @@
- -march= x86-64-v2
+ -march= sandybridge
@@ -31 +31 @@
- -mavx [disabled]
+ -mavx [enabled]
@@ -33,2 +33,2 @@
- -mavx256-split-unaligned-load [disabled]
- -mavx256-split-unaligned-store [disabled]
+ -mavx256-split-unaligned-load [enabled]
+ -mavx256-split-unaligned-store [enabled]
@@ -145 +145 @@
- -mpclmul [disabled]
+ -mpclmul [enabled]
@@ -204 +204 @@
- -mtune= generic
+ -mtune= sandybridge
@@ -218 +218 @@
- -mxsave [disabled]
+ -mxsave [enabled]
@@ -220 +220 @@
- -mxsaveopt [disabled]
+ -mxsaveopt [enabled] |
If you could give an extra check by the way, pretty please, I would appreciate it. Because I think the problem is just here. If LUA got an illegal instruction, so far as I understood, the wrong flags on the binhost makes it incompatible with SandyBridge's CPU.
I'll guess the fault is on me, but something is grinding my gears:
How so many packages were able to being built and installed on the client, but LUA fails ? I might miss necessary knowledge, because it is really not obvious to me. While doing all the installation process of the client, with binary packages from the binhost from the very beginning, I was expecting to not even finish the Gentoo's installation from these binaries if I misconfigured it's binary building process by settings incorrect flags.
I was wrong !
If you need more information, please ask. I tried hard to make all above as obvious / clear to read as I can. But that is a good payload of output and it started to confuse me.
Regards,
GASPARD DE RENEFORT Kévin _________________ wiki/User:Kgdrenefort/captain_logs My system info
G. does not have problems, only learning opportunities. - NeddyS.
If your installation isn't valuable to you, feel free to continue to ignore the instructions. - figue. |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9538 Location: beyond the rim
|
Posted: Wed Apr 17, 2024 4:21 pm Post subject: |
|
|
Unfortunately without more information this will turn into a guessing game. While SIGILL is most likely caused by a binary containing an instruction not supported by the current CPU, it can also have other causes (e.g. https://github.com/luau-lang/luau/issues/446 )
So unless you can determine what exactly caused the problem within LUA (which opcode or at least which function) I don't think there is much you can do as from a quick glance your flags seem to be fine. And of course there is always the tiny chance of hardware failure, compiler or kernel bugs, and so on. If you still have the failing binary you could try sth. like https://github.com/baryluk/elf-opcode-stats to check which opcodes are used by it and compare that to the working binary. |
|
Back to top |
|
|
kgdrenefort Apprentice
Joined: 19 Sep 2023 Posts: 216 Location: Somewhere in the 77
|
Posted: Thu Apr 18, 2024 9:40 am Post subject: |
|
|
Genone wrote: | Unfortunately without more information this will turn into a guessing game. While SIGILL is most likely caused by a binary containing an instruction not supported by the current CPU, it can also have other causes (e.g. https://github.com/luau-lang/luau/issues/446 )
So unless you can determine what exactly caused the problem within LUA (which opcode or at least which function) I don't think there is much you can do as from a quick glance your flags seem to be fine. And of course there is always the tiny chance of hardware failure, compiler or kernel bugs, and so on. If you still have the failing binary you could try sth. like https://github.com/baryluk/elf-opcode-stats to check which opcodes are used by it and compare that to the working binary. |
Hello and thanks for these information.
From talking with a friend of mine, doing sys admin & development, he suggest me to try a package that seems to heavily use LUA: minetest. Maybe it'll bring more information, maybe not. It's free to try anyway…
After some talking I might have a dirty workaround that I think of:
If there is only a problem with dev-lang/lua (which isn't 100% sure at this moment, that is just the first package to have made troubles), I could ask the client to only compile this package and not use a binary. If it's the building the real cause behind all this.
As I explained to my friend, that is weird to have only this single package having these illegal instruction, made me think it's maybe not really my settings at the root of the issue.
I'll dig further into this problem, tho, because that is interesting and, if it's a bug somewhere, it would be neat to find it out and report it.
I'll get back into this topic after using the elf-opcode-stats from github and checking further the bug post in your reply.
Thanks, as usual.
Regards,
GASPARD DE RENEFORT Kévin _________________ wiki/User:Kgdrenefort/captain_logs My system info
G. does not have problems, only learning opportunities. - NeddyS.
If your installation isn't valuable to you, feel free to continue to ignore the instructions. - figue. |
|
Back to top |
|
|
kgdrenefort Apprentice
Joined: 19 Sep 2023 Posts: 216 Location: Somewhere in the 77
|
Posted: Fri Apr 19, 2024 11:28 am Post subject: |
|
|
Hello,
Done a quicktest: Minetest runs great from the binary of the binhost.
I really start to think I simple did something bad but without any needs to fix beside changing LUA slots, or I had bad luck ! I mean, that were only lua that was a problem, everything else is doing great so far.
I'll try to rebuild a new LUA from my binhost and push it to the client and changes the slot to use a binary package I made, then seeing how it goes.
If it goes well, I'll simply assume I did something wrong at some point and probably won't reproduce, until it happens again later.
If it does not I'll try to keep going searching the root of this problem and as a workaround, force lua to be compiled on client instead of retrieving a failed-build.
Regards,
GASPARD DE RENEFORT Kévin _________________ wiki/User:Kgdrenefort/captain_logs My system info
G. does not have problems, only learning opportunities. - NeddyS.
If your installation isn't valuable to you, feel free to continue to ignore the instructions. - figue. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|