View previous topic :: View next topic |
Author |
Message |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Sat Mar 29, 2025 8:30 am Post subject: amd64 binary distributions optimized for 2003-era processor? |
|
|
I was under the impression that the x86 extensions feature allowed the binary to execute 'substitute instructions' in case certain instructions where not found in the running processor. However this does not seem to be true. This essentially implies that any binary OS which advertises support for the 2003 k8 processor, contains instructions limited to the k8 processor unless the programmer codes for gcc's multiversioning feature.
To see how much does gcc uses these 'new' instructions --
Code: | /usr/x86_64-mypl-linux-gnu/usr/bin/cat --help
Usage: /usr/x86_64-mypl-linux-gnu/usr/bin/cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.
With no FILE, or when FILE is -, read standard input.
-A, --show-all equivalent to -vET
-b, --number-nonblank number nonempty output lines, overrides -n
-e equivalent to -vE
-E, --show-ends display $ at end of each line
-n, --number number all output lines
-s, --squeeze-blank suppress repeated empty output lines
-t equivalent to -vT
-T, --show-tabs display TAB characters as ^I
-u (ignored)
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB
--help display this help and exit
--version output version information and exit
Examples:
/usr/x86_64-mypl-linux-gnu/usr/bin/cat f - g Output f's contents, then standard input, then g's contents.
/usr/x86_64-mypl-linux-gnu/usr/bin/cat Copy standard input to standard output.
Illegal instruction (core dumped) |
I can't even chroot into this install. It almost feels like an ARM machine.
So it is true, that 99.9% of prebuilt binary applications (even the kernel) are NOT using a processor's new instructions to maintain compatibility? And so if you buy a new processors, it's performance gains in 99.9% of the cases (when using prebuilt binaries) are limited to how fast legacy instructions are executed?
For the same reason I was wonder why this benchmark works with the same binaries with avx512 disabled. _________________ My blog |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 9974 Location: almost Mile High in the USA
|
Posted: Sun Mar 30, 2025 1:04 am Post subject: |
|
|
I don't think that was ever the case that cpus would emulate new instructions, though all modern CPUs trap on invalid instructions. However these traps are extremely expensive and the kernel may or may not be able to handle translation or not. So yes you will need to get a properly compiled kernel and software appropriate for the CPU. And yeah that's why clock speed has always been king since a lot of software writers will target to make sure as many people can run the software as possible.
I do have a few base amd64 CPUs (AMD K8; Intel P4). Because of this I tend to build all my binaries base amd64 just so I can shift binaries between machines at a whim.
The "x86_64_v3" fiasco lately assumes avx which means it's fairly late model CPU is necessary. I think a lot of distributions target v3 now which will no longer run on first and even second or third rev CPUs) and building the binaries for yourself may be your only option.
IIRC:
x86_64 base: All 64-bit CPUs starting with K8 and P4.
x86_64_v2: SSE4
x86_64_v3: AVX2
x86_64_v4: AVX512
Fat binaries that have multiple code streams is the best way to optimize for old and new CPUs without SIGILL's, but gcc does not do this... And AVX tends to not really affect how fast /bin/cat runs ... though there are segments like strcpy and memset which can benefit a bit from avx but I doubt that most people would notice. One time I noticed gcc decided to use avx to clear a register instead of xoring it with itself (or loading an immediate 0 into the register). AVX does take fewer cycles than the other two so it is faster, but it doesn't happen very often and I think it's annoying because it breaks compatibility with my old cpus. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Sun Mar 30, 2025 8:05 am Post subject: |
|
|
Is there any way I can build a 'fat binary'? Because I'm trying to cross compile, but because of toolchain bugs (they run the cross-compiled code), many packages are failing. There are no x64 emulators which support avx512 (the source my problem).
I wonder how those Arch guys would react to this. They build for x64 baseline. Yeah, their packages might be latest but the instructions are 2003 era... _________________ My blog |
|
Back to top |
|
 |
Zucca Moderator


Joined: 14 Jun 2007 Posts: 4001 Location: Rasi, Finland
|
Posted: Sun Mar 30, 2025 12:44 pm Post subject: |
|
|
There was (is?) FatELF project. Don't ask me how to use or incorporate it into portage build/packaging processes. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
 |
Hu Administrator

Joined: 06 Mar 2007 Posts: 23276
|
Posted: Sun Mar 30, 2025 1:38 pm Post subject: |
|
|
Cross-compiling normally means that the compiler is producing output that is for a foreign architecture and therefore cannot be run locally, no matter how modern the build CPU is. I don't think a fat binary would help you, because if the offending package were even slightly well behaved, it would be respecting your existing CFLAGS that tell is not to use modern instructions - assuming you did set your flags properly. A package that is so poorly behaved that it ignores your CFLAGS will likely also ignore the CFLAGS used to tell it to produce a fat binary.
Could you provide a little more detail on what is happening here? On what generation CPU are you trying to run code? Where did you get the code that is not working, and how was it built? |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Sun Mar 30, 2025 2:51 pm Post subject: |
|
|
Zucca wrote: | There was (is?) FatELF project. Don't ask me how to use or incorporate it into portage build/packaging processes. |
This is about changing the ELF format (and therefore requires patching the kernel). If ELF could be modified for this purpose by Linux, this would be extremely attractive in these mixed x64-arm days. _________________ My blog |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Sun Mar 30, 2025 3:08 pm Post subject: |
|
|
Hu wrote: | Cross-compiling normally means that the compiler is producing output that is for a foreign architecture and therefore cannot be run locally, no matter how modern the build CPU is. I don't think a fat binary would help you, because if the offending package were even slightly well behaved, it would be respecting your existing CFLAGS that tell is not to use modern instructions - assuming you did set your flags properly. A package that is so poorly behaved that it ignores your CFLAGS will likely also ignore the CFLAGS used to tell it to produce a fat binary.
Could you provide a little more detail on what is happening here? On what generation CPU are you trying to run code? Where did you get the code that is not working, and how was it built? |
I've a zen1 machine and producing binaries for alder/raptorlake and icelake using crossdev. Now certain packages (like chromium, x11-libs/gtk+:3, www-client/firefox, x11-libs/gdk-pixbuf etc...) have a broken toolchain in a sense that in a stage of the build process, they're executing freshly compiled binaries with -march=alderlake/raptorlake/icelake on the build host which is a zen1 machine. Therefore, the compilation process fails with errors like --
Code: | traps: protoc[43437] trap invalid opcode ip:562f545d6b90 sp:7ffd03ae4660 error:0 in protoc[2e8b90,562f5438c000+351000]
traps: ocloc-24.35.1[2971303] trap invalid opcode ip:7f4be20a9720 sp:7ffd921180e0 error:0 in libocloc.so |
etc...
This is mostly occurring with icelake because of the avx512 BS that intel did. Of course disabling those instructions in CFLAGS resolves the issue, but I'm trying to avoid doing that.
Also there are no emulator which support avx512 (like qemu-x86_64), so I've no choice other than reporting bugs and adding -mno-avx512f to CFLAGS for the time being. _________________ My blog |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55101 Location: 56N 3W
|
Posted: Sun Mar 30, 2025 3:19 pm Post subject: |
|
|
dE_logics,
You can make the kernel do what you want with an Illegal Instruction exception.
The last time that I recall it did any more than kill the offending process was in the days of the 386 and 486SX, both of which lacked hardware floating point.
The kernel could be built with floating point emulation, so that when a floating point instruction was trapped, instead of the process being killed, the kernel would execute the instruction in software.
It's possible to patch the kernel to do the same with any instructions but its probably faster to avoid them than to emulate them.
Intel have Intel® Software Development Emulator (Intel® SDE) _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Mon Mar 31, 2025 6:36 am Post subject: |
|
|
NeddySeagoon wrote: | dE_logics,
You can make the kernel do what you want with an Illegal Instruction exception.
The last time that I recall it did any more than kill the offending process was in the days of the 386 and 486SX, both of which lacked hardware floating point.
The kernel could be built with floating point emulation, so that when a floating point instruction was trapped, instead of the process being killed, the kernel would execute the instruction in software.
It's possible to patch the kernel to do the same with any instructions but its probably faster to avoid them than to emulate them.
Intel have Intel® Software Development Emulator (Intel® SDE) |
So the kernel could execute intel's SDE with that binary in case it trapped an Illegal Instruction? Is there any framework like this? _________________ My blog |
|
Back to top |
|
 |
NeddySeagoon Administrator


Joined: 05 Jul 2003 Posts: 55101 Location: 56N 3W
|
Posted: Mon Mar 31, 2025 7:25 am Post subject: |
|
|
dE_logics,
That Intel SDE is a user space program that is used on top of the kernel.
It will emulate Intel instructions missing frow the real hardware, for programs rum under its control.
Its unlikely to emulate AMD instruction extensions :)
Its not a kernel patch, or kernel option, which is what I think you would like, so that it just worked. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Mon Mar 31, 2025 8:17 am Post subject: |
|
|
No, actually SDE failed major. You may like to see this post. _________________ My blog |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 9974 Location: almost Mile High in the USA
|
Posted: Mon Mar 31, 2025 9:32 am Post subject: |
|
|
Back up a sec. I still am not sure what you're doing here...trying to install Linux on a K8 or P4 that only supports x86_64_v1?
Did Gentoo already require v3 on stage3 or something? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Mon Mar 31, 2025 10:26 am Post subject: |
|
|
In brief, I'm trying to cross compile x86-64-v4 binaries on x86-64-v3.
stage3 must be in baseline. _________________ My blog |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 9974 Location: almost Mile High in the USA
|
Posted: Mon Mar 31, 2025 2:41 pm Post subject: |
|
|
You can use the x86-64-v3 as a distcc host, so the v4 machine can do the remainder of stuff so the v3 machine doesn't have to run any v4 binaries? Yes if the v4 machine is a MHz/core/RAM limited laptop then it would be a pain but I think most modern CPUs with a mere 8GiB should be fine.
Except if it's an Atom ... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Mon Mar 31, 2025 3:47 pm Post subject: |
|
|
I'm just trying to avoid frying the laptop like it happened last time. I ran gentoo on it since 2009 to I think at 2010 I died. Although I believe other CPU intensive tasks where also to blame.
So I rather drop avx512 instructions.
What happens with distcc is that that compiling is done remotely, but everything else (including linking) is done locally? _________________ My blog |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 9974 Location: almost Mile High in the USA
|
Posted: Mon Mar 31, 2025 5:08 pm Post subject: |
|
|
I've ran my laptops full tilt to build Gentoo frequently and it's been fine, though i5's are the most I've ever had. It's a matter of making sure the fan remains clear of dust and blockages, IMHO. I've also run my atom laptop 5 days straight doing Gentoo upgrades, so it's almost compiled 24 hours/day for 5 days. Pretty much the only breaks it get is if portage crashes out and I have to fix something and restart.
Except for things that cannot be distributed, distcc will allow the compilation to be run on other machines. Preprocess, linking are still done on the local machine. Yes unfortunately LTO is linking so it runs locally too. The Atoms are so slow that sometimes it can't keep up with the preprocessing and my helpers starve of stuff to do. The single core Atom laptop frequently does this unfortunately. The quad core atom server sometimes does it too but not as pronounced as the single core. My Core2 Quads (which are at least 2x the speed of my quad core atom) don't exhibit this issue and can keep my helpers busy for the most part. The dual core i5's also are able to keep helpers busy...
Then again I think distcc significantly helps:
- webkit-gtk
- qtwebengine
- firefox/thunderbird
- chromium
- llvm / clang
- nodejs
- vtk
There are some more that I can't remember off the top of my head at the moment. It's good seeing all the helpers churning away...
The other packages tend to build fast enough such that the benefit from distcc isn't as noticeable. Well, except the packages like rust and gcc that don't distribute as they depend on itself. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Tue Apr 01, 2025 6:07 am Post subject: |
|
|
So what I can do is use distcc when a package cannot be cross compiled. It'll be a backup option.
Thanks for suggesting this.
For for the story of by burnt out laptop, it was an Athlon x2. The CPU didn't have problems, but the mobo did. _________________ My blog |
|
Back to top |
|
 |
eccerr0r Watchman

Joined: 01 Jul 2004 Posts: 9974 Location: almost Mile High in the USA
|
Posted: Tue Apr 01, 2025 10:54 pm Post subject: |
|
|
Well, self host...the machine with the largest instruction set still needs to build but can offload compilation work to the other machines. If so desired you can just not have any compilation done on the machine. The helpers will send back object files that are tailored to what your CFLAGS dictate and they don't ever need to run the code they generated.
I do have to say one caveat of Chromium, Firefox, and Thunderbird distcc: they have a bit of rust in them and that can't be distributed. However there's a lot of C++ that can.
Nodejs, QTWebengine, and webkit-gtk all hammer the distcc helpers. However, I'm kind of surprised qtwebengine doesn't have rust in it yet? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Wed Apr 02, 2025 8:10 am Post subject: |
|
|
Yup, rust is the future. It even ended up in the kernel.
sccache is like distcc for rust. _________________ My blog |
|
Back to top |
|
 |
dE_logics Advocate


Joined: 02 Jan 2009 Posts: 2335 Location: $TERM
|
Posted: Thu Apr 03, 2025 4:19 am Post subject: |
|
|
In the mean time this cross boss script works for many packages. _________________ My blog |
|
Back to top |
|
 |
|