Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
amd64 binary distributions optimized for 2003-era processor?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware
View previous topic :: View next topic  
Author Message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Sat Mar 29, 2025 8:30 am    Post subject: amd64 binary distributions optimized for 2003-era processor? Reply with quote

I was under the impression that the x86 extensions feature allowed the binary to execute 'substitute instructions' in case certain instructions where not found in the running processor. However this does not seem to be true. This essentially implies that any binary OS which advertises support for the 2003 k8 processor, contains instructions limited to the k8 processor unless the programmer codes for gcc's multiversioning feature.

To see how much does gcc uses these 'new' instructions --
Code:
/usr/x86_64-mypl-linux-gnu/usr/bin/cat --help
Usage: /usr/x86_64-mypl-linux-gnu/usr/bin/cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.

With no FILE, or when FILE is -, read standard input.

  -A, --show-all           equivalent to -vET
  -b, --number-nonblank    number nonempty output lines, overrides -n
  -e                       equivalent to -vE
  -E, --show-ends          display $ at end of each line
  -n, --number             number all output lines
  -s, --squeeze-blank      suppress repeated empty output lines
  -t                       equivalent to -vT
  -T, --show-tabs          display TAB characters as ^I
  -u                       (ignored)
  -v, --show-nonprinting   use ^ and M- notation, except for LFD and TAB
      --help        display this help and exit
      --version     output version information and exit

Examples:
  /usr/x86_64-mypl-linux-gnu/usr/bin/cat f - g  Output f's contents, then standard input, then g's contents.
  /usr/x86_64-mypl-linux-gnu/usr/bin/cat        Copy standard input to standard output.
Illegal instruction (core dumped)


I can't even chroot into this install. It almost feels like an ARM machine.

So it is true, that 99.9% of prebuilt binary applications (even the kernel) are NOT using a processor's new instructions to maintain compatibility? And so if you buy a new processors, it's performance gains in 99.9% of the cases (when using prebuilt binaries) are limited to how fast legacy instructions are executed?

For the same reason I was wonder why this benchmark works with the same binaries with avx512 disabled.
_________________
My blog
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9974
Location: almost Mile High in the USA

PostPosted: Sun Mar 30, 2025 1:04 am    Post subject: Reply with quote

I don't think that was ever the case that cpus would emulate new instructions, though all modern CPUs trap on invalid instructions. However these traps are extremely expensive and the kernel may or may not be able to handle translation or not. So yes you will need to get a properly compiled kernel and software appropriate for the CPU. And yeah that's why clock speed has always been king since a lot of software writers will target to make sure as many people can run the software as possible.

I do have a few base amd64 CPUs (AMD K8; Intel P4). Because of this I tend to build all my binaries base amd64 just so I can shift binaries between machines at a whim.

The "x86_64_v3" fiasco lately assumes avx which means it's fairly late model CPU is necessary. I think a lot of distributions target v3 now which will no longer run on first and even second or third rev CPUs) and building the binaries for yourself may be your only option.

IIRC:
x86_64 base: All 64-bit CPUs starting with K8 and P4.
x86_64_v2: SSE4
x86_64_v3: AVX2
x86_64_v4: AVX512

Fat binaries that have multiple code streams is the best way to optimize for old and new CPUs without SIGILL's, but gcc does not do this... And AVX tends to not really affect how fast /bin/cat runs ... though there are segments like strcpy and memset which can benefit a bit from avx but I doubt that most people would notice. One time I noticed gcc decided to use avx to clear a register instead of xoring it with itself (or loading an immediate 0 into the register). AVX does take fewer cycles than the other two so it is faster, but it doesn't happen very often and I think it's annoying because it breaks compatibility with my old cpus.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Sun Mar 30, 2025 8:05 am    Post subject: Reply with quote

Is there any way I can build a 'fat binary'? Because I'm trying to cross compile, but because of toolchain bugs (they run the cross-compiled code), many packages are failing. There are no x64 emulators which support avx512 (the source my problem).

I wonder how those Arch guys would react to this. They build for x64 baseline. Yeah, their packages might be latest but the instructions are 2003 era...
_________________
My blog
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 4001
Location: Rasi, Finland

PostPosted: Sun Mar 30, 2025 12:44 pm    Post subject: Reply with quote

There was (is?) FatELF project. Don't ask me how to use or incorporate it into portage build/packaging processes.
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23276

PostPosted: Sun Mar 30, 2025 1:38 pm    Post subject: Reply with quote

Cross-compiling normally means that the compiler is producing output that is for a foreign architecture and therefore cannot be run locally, no matter how modern the build CPU is. I don't think a fat binary would help you, because if the offending package were even slightly well behaved, it would be respecting your existing CFLAGS that tell is not to use modern instructions - assuming you did set your flags properly. A package that is so poorly behaved that it ignores your CFLAGS will likely also ignore the CFLAGS used to tell it to produce a fat binary.

Could you provide a little more detail on what is happening here? On what generation CPU are you trying to run code? Where did you get the code that is not working, and how was it built?
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Sun Mar 30, 2025 2:51 pm    Post subject: Reply with quote

Zucca wrote:
There was (is?) FatELF project. Don't ask me how to use or incorporate it into portage build/packaging processes.


This is about changing the ELF format (and therefore requires patching the kernel). If ELF could be modified for this purpose by Linux, this would be extremely attractive in these mixed x64-arm days.
_________________
My blog
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Sun Mar 30, 2025 3:08 pm    Post subject: Reply with quote

Hu wrote:
Cross-compiling normally means that the compiler is producing output that is for a foreign architecture and therefore cannot be run locally, no matter how modern the build CPU is. I don't think a fat binary would help you, because if the offending package were even slightly well behaved, it would be respecting your existing CFLAGS that tell is not to use modern instructions - assuming you did set your flags properly. A package that is so poorly behaved that it ignores your CFLAGS will likely also ignore the CFLAGS used to tell it to produce a fat binary.

Could you provide a little more detail on what is happening here? On what generation CPU are you trying to run code? Where did you get the code that is not working, and how was it built?


I've a zen1 machine and producing binaries for alder/raptorlake and icelake using crossdev. Now certain packages (like chromium, x11-libs/gtk+:3, www-client/firefox, x11-libs/gdk-pixbuf etc...) have a broken toolchain in a sense that in a stage of the build process, they're executing freshly compiled binaries with -march=alderlake/raptorlake/icelake on the build host which is a zen1 machine. Therefore, the compilation process fails with errors like --
Code:
traps: protoc[43437] trap invalid opcode ip:562f545d6b90 sp:7ffd03ae4660 error:0 in protoc[2e8b90,562f5438c000+351000]
traps: ocloc-24.35.1[2971303] trap invalid opcode ip:7f4be20a9720 sp:7ffd921180e0 error:0 in libocloc.so

etc...

This is mostly occurring with icelake because of the avx512 BS that intel did. Of course disabling those instructions in CFLAGS resolves the issue, but I'm trying to avoid doing that.

Also there are no emulator which support avx512 (like qemu-x86_64), so I've no choice other than reporting bugs and adding -mno-avx512f to CFLAGS for the time being.
_________________
My blog
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 55101
Location: 56N 3W

PostPosted: Sun Mar 30, 2025 3:19 pm    Post subject: Reply with quote

dE_logics,

You can make the kernel do what you want with an Illegal Instruction exception.

The last time that I recall it did any more than kill the offending process was in the days of the 386 and 486SX, both of which lacked hardware floating point.
The kernel could be built with floating point emulation, so that when a floating point instruction was trapped, instead of the process being killed, the kernel would execute the instruction in software.

It's possible to patch the kernel to do the same with any instructions but its probably faster to avoid them than to emulate them.
Intel have Intel® Software Development Emulator (Intel® SDE)
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Mon Mar 31, 2025 6:36 am    Post subject: Reply with quote

NeddySeagoon wrote:
dE_logics,

You can make the kernel do what you want with an Illegal Instruction exception.

The last time that I recall it did any more than kill the offending process was in the days of the 386 and 486SX, both of which lacked hardware floating point.
The kernel could be built with floating point emulation, so that when a floating point instruction was trapped, instead of the process being killed, the kernel would execute the instruction in software.

It's possible to patch the kernel to do the same with any instructions but its probably faster to avoid them than to emulate them.
Intel have Intel® Software Development Emulator (Intel® SDE)


So the kernel could execute intel's SDE with that binary in case it trapped an Illegal Instruction? Is there any framework like this?
_________________
My blog
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 55101
Location: 56N 3W

PostPosted: Mon Mar 31, 2025 7:25 am    Post subject: Reply with quote

dE_logics,

That Intel SDE is a user space program that is used on top of the kernel.
It will emulate Intel instructions missing frow the real hardware, for programs rum under its control.
Its unlikely to emulate AMD instruction extensions :)

Its not a kernel patch, or kernel option, which is what I think you would like, so that it just worked.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Mon Mar 31, 2025 8:17 am    Post subject: Reply with quote

No, actually SDE failed major. You may like to see this post.
_________________
My blog
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9974
Location: almost Mile High in the USA

PostPosted: Mon Mar 31, 2025 9:32 am    Post subject: Reply with quote

Back up a sec. I still am not sure what you're doing here...trying to install Linux on a K8 or P4 that only supports x86_64_v1?
Did Gentoo already require v3 on stage3 or something?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Mon Mar 31, 2025 10:26 am    Post subject: Reply with quote

In brief, I'm trying to cross compile x86-64-v4 binaries on x86-64-v3.

stage3 must be in baseline.
_________________
My blog
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9974
Location: almost Mile High in the USA

PostPosted: Mon Mar 31, 2025 2:41 pm    Post subject: Reply with quote

You can use the x86-64-v3 as a distcc host, so the v4 machine can do the remainder of stuff so the v3 machine doesn't have to run any v4 binaries? Yes if the v4 machine is a MHz/core/RAM limited laptop then it would be a pain but I think most modern CPUs with a mere 8GiB should be fine.

Except if it's an Atom ...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Mon Mar 31, 2025 3:47 pm    Post subject: Reply with quote

I'm just trying to avoid frying the laptop like it happened last time. I ran gentoo on it since 2009 to I think at 2010 I died. Although I believe other CPU intensive tasks where also to blame.

So I rather drop avx512 instructions.

What happens with distcc is that that compiling is done remotely, but everything else (including linking) is done locally?
_________________
My blog
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9974
Location: almost Mile High in the USA

PostPosted: Mon Mar 31, 2025 5:08 pm    Post subject: Reply with quote

I've ran my laptops full tilt to build Gentoo frequently and it's been fine, though i5's are the most I've ever had. It's a matter of making sure the fan remains clear of dust and blockages, IMHO. I've also run my atom laptop 5 days straight doing Gentoo upgrades, so it's almost compiled 24 hours/day for 5 days. Pretty much the only breaks it get is if portage crashes out and I have to fix something and restart.

Except for things that cannot be distributed, distcc will allow the compilation to be run on other machines. Preprocess, linking are still done on the local machine. Yes unfortunately LTO is linking so it runs locally too. The Atoms are so slow that sometimes it can't keep up with the preprocessing and my helpers starve of stuff to do. The single core Atom laptop frequently does this unfortunately. The quad core atom server sometimes does it too but not as pronounced as the single core. My Core2 Quads (which are at least 2x the speed of my quad core atom) don't exhibit this issue and can keep my helpers busy for the most part. The dual core i5's also are able to keep helpers busy...

Then again I think distcc significantly helps:

- webkit-gtk
- qtwebengine
- firefox/thunderbird
- chromium
- llvm / clang
- nodejs
- vtk

There are some more that I can't remember off the top of my head at the moment. It's good seeing all the helpers churning away...

The other packages tend to build fast enough such that the benefit from distcc isn't as noticeable. Well, except the packages like rust and gcc that don't distribute as they depend on itself.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Tue Apr 01, 2025 6:07 am    Post subject: Reply with quote

So what I can do is use distcc when a package cannot be cross compiled. It'll be a backup option.

Thanks for suggesting this.

For for the story of by burnt out laptop, it was an Athlon x2. The CPU didn't have problems, but the mobo did.
_________________
My blog
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9974
Location: almost Mile High in the USA

PostPosted: Tue Apr 01, 2025 10:54 pm    Post subject: Reply with quote

Well, self host...the machine with the largest instruction set still needs to build but can offload compilation work to the other machines. If so desired you can just not have any compilation done on the machine. The helpers will send back object files that are tailored to what your CFLAGS dictate and they don't ever need to run the code they generated.

I do have to say one caveat of Chromium, Firefox, and Thunderbird distcc: they have a bit of rust in them and that can't be distributed. However there's a lot of C++ that can.

Nodejs, QTWebengine, and webkit-gtk all hammer the distcc helpers. However, I'm kind of surprised qtwebengine doesn't have rust in it yet?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Wed Apr 02, 2025 8:10 am    Post subject: Reply with quote

Yup, rust is the future. It even ended up in the kernel.

sccache is like distcc for rust.
_________________
My blog
Back to top
View user's profile Send private message
dE_logics
Advocate
Advocate


Joined: 02 Jan 2009
Posts: 2335
Location: $TERM

PostPosted: Thu Apr 03, 2025 4:19 am    Post subject: Reply with quote

In the mean time this cross boss script works for many packages.
_________________
My blog
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Kernel & Hardware All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum