View previous topic :: View next topic |
Author |
Message |
lars_the_bear Guru
Joined: 05 Jun 2024 Posts: 522
|
Posted: Sat Jun 29, 2024 7:16 am Post subject: How much difference does CPU optimization make in practice? |
|
|
Hi folks
While I appreciate that building software with optimizations specific to the CPU offers theoretical advantages, I'm wondering how much benefit this has in practice. The users of mainstream binary distributions don't have this feature and, on the whole, they don't seem too concerned about it.
I've complained elsewhere how long compiling takes on the kind of hardware I have (of course, I'm not the first person to do so); I'm hoping to be able to compile once for a bunch of machines that are similar, but not identical.
Does anybody have a feel for what is lost by not compiling for the specific CPU? Or what kinds of application are more affected? FWIW I'm not interested in games, but I do run 3D design stuff, and a load of photo/video editing.
BR, Lars. |
|
Back to top |
|
|
carcajou Apprentice
Joined: 10 Jun 2008 Posts: 248
|
Posted: Sat Jun 29, 2024 8:13 am Post subject: |
|
|
Personally, I did not perform any benchamarks, but last 3 updates I was using mostly official binaries. Did not notice any difference. I do not think that in the era of fast RAM memory and SSDs these differences are that much noticeable.
If differences actually exist, I am positive you can see them in multimedia processing applications, CPU intensive long-lasting tasks and similar. Not sure if the performance gain is actually worth of compilation time. |
|
Back to top |
|
|
Goverp Advocate
Joined: 07 Mar 2007 Posts: 2179
|
Posted: Sat Jun 29, 2024 9:29 am Post subject: |
|
|
IMHO, in general the answer will be "not much", but for specific cases it will be "a lot". For example, there's a package (I forget which) that can exploit hardware crypto functions, if available, that's selected by a USE flag. If it's available and you're running a web server, it's likely to have a noticeable effect. Similar things will be true about the latest vector instructions if your workload includes some particular classes of heavy computation (and that includes AI workloads). And soon there will be further significant advantages to exploiting AI functions in hardware. _________________ Greybeard |
|
Back to top |
|
|
CaptainBlood Advocate
Joined: 24 Jan 2010 Posts: 3864
|
Posted: Sat Jun 29, 2024 10:30 am Post subject: |
|
|
binary size is likely to be shrinked in most cases, implying with ram lower usage.
When short on RAM, this may save from slowdown due to swap usage.
These are corner cases though.
Choice of compiler may benefit performance is some cases:
Phoronix gcc 14 vs llvm 18
Unfortunately most package devs don't publish benchmarks in this regard.
Thks 4 ur attention, interest & support. _________________ USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. " |
|
Back to top |
|
|
kurdy n00b
Joined: 29 Dec 2023 Posts: 6 Location: Switzerland
|
Posted: Sat Jun 29, 2024 11:24 am Post subject: |
|
|
Hello,
I did some tests using the same machine almost the same kde once with Manjaro and after with Gentoo.
I used unixbench and blender, I also did some tests using compression tools, videos ffmpeg, Rust compile, openssl. All show a gain between 2 to 12% but with a large variability.
Unixbench shows 2 tests with huge gain:
Pipe-based Context Switching
This test measures the number of times two processes can exchange an increasing integer through a pipe. The pipe-based context switching test is more like a real-world application. The test program spawns a child process with which it carries on a bi-directional pipe conversation.
System Call Overhead
This estimates the cost of entering and leaving the operating system kernel, i.e., the overhead for performing a system call. It consists of a simple program repeatedly calling the getpid (which returns the process id of the calling process) system call. The time to execute such calls is used to estimate the cost of entering and exiting the kernel.
this corroborates a feeling of use Gentoo seems to me to be more responsive overall. The 5 to 10 percent gain on tasks such as compiling, video conversion, compression etc... doesn't change my life, but overall today I'm willing to pay the price of a little Gentoo complexity for it. Note that I haven't migrated to my work machine, because I don't want to wait or spend time for this complexity.
Below I share some results of unixbench as images. Blender result is globally the same.
View results Graph 1
View results Graph 2
Regards
Last edited by kurdy on Sun Jun 30, 2024 8:10 am; edited 1 time in total |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sat Jun 29, 2024 3:57 pm Post subject: |
|
|
lars_the_bear,
Multimedia and crypto will see improvements if your CPU supports the required instructions.
The original Intel mme extensions were aimed at multimedia, as were 3dNow! from AMD.
libre-office will spend more time waiting for your keystrokes. :)
In short, generics are fine for most things. Where it matters, you may want to build custom packages.
Binary distro users are happy as they can't tell the difference. :) _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
lars_the_bear Guru
Joined: 05 Jun 2024 Posts: 522
|
Posted: Sat Jun 29, 2024 6:19 pm Post subject: |
|
|
HIi
Thanks to all who replied. The impression I get is that CPU-specific optimization is only going to make a noticeable difference for very specific applications.
But I'm still confused...
If I don't specify anything for '-march=...', I think I get the generic x86-64 by default. The compiler won't (IIUC) generate code that uses any of the later extensions, like MMX and AVX. But I did what the Gentoo installation documentation said to do, which was to use `cpuid2cpuflags` to generate `package.use/00cpu-flags`. These flags contain `avx`, `mmx`, and a bunch of others.
Is it a problem if these USE flags are not in agreement with the gcc -march setting?
BR, Lars. |
|
Back to top |
|
|
Anon-E-moose Watchman
Joined: 23 May 2008 Posts: 6148 Location: Dallas area
|
Posted: Sat Jun 29, 2024 8:02 pm Post subject: |
|
|
cpu optimizations were really helpful back in the Pentium* days, less so now at least for modern systems.
For old systems it can still play a part, but typically old systems go from general purpose to single purpose, ie file server, music server, etc
and there's not much need for cpu optimizations in these cases. _________________ UM780, 6.1 zen kernel, gcc 13, profile 17.0 (custom bare multilib), openrc, wayland |
|
Back to top |
|
|
lars_the_bear Guru
Joined: 05 Jun 2024 Posts: 522
|
Posted: Sun Jun 30, 2024 7:37 am Post subject: |
|
|
Anon-E-moose wrote: | ...but typically old systems go from general purpose to single purpose, ie file server, music server, etc |
Fair enough; but I'm specifically trying to see whether my 2012-2015 Lenovo laptops will serve for daily, general-purpose use.
These were top-of-the-range systems back in their day, and they still have a lot (in my view) to recommend them. They all have user-replaceable batteries, multiple SSDs, tool-free dismantling, desktop docking stations, hot-swappable storage, and a bunch of other things that don't seem to exist today, except at ruinous prices.
My old laptops do run things like FreeCAD and Darktable with just-about-acceptable performance. Only just, though. From the above, I don't get the impression that fiddling with CPU optimization will make a radical difference.
BR, Lars. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sun Jun 30, 2024 10:22 am Post subject: |
|
|
lars_the_bear,
-march and cpuid2cpuflags (package.use/00cpu-flags) do two different things.
As you say, generic x86-64 is what you get by default when -march is not given. In includes a few CFLAGS that all amd64 CPUs have.
CFLAGS give compilers permission to use the instruction set extensions listed. They are not compelled to.
When you use the wrong -march, sometimes the binary works, sometimes you get an illegal instruction exception. It all depends what the compiler actually did.
cpuid2cpuflags sets a USE_EXPAND to allow build systems to take advantage of optional optimised code segments.
They are not used by compilers.
I have Code: | CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3" |
Try Code: | qgrep cpu_flags_x86_avx2 | to see everywhere in the repo that cpu_flags_x86_avx2 is used.
This is an instruction (not a permission) to build systems. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5117 Location: Bavaria
|
Posted: Sun Jun 30, 2024 10:30 am Post subject: |
|
|
NeddySeagoon wrote: | Try Code: | qgrep cpu_flags_x86_avx2 | to see everywhere in the repo that cpu_flags_x86_avx2 is used. |
If you want see which of your installed packages use e.g. avx2 you can search with:
_________________ https://wiki.gentoo.org/wiki/User:Pietinger |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54578 Location: 56N 3W
|
Posted: Sun Jun 30, 2024 11:55 am Post subject: |
|
|
pietinger,
Thank you.
eix is huge. That's a new corner of it for me. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Goverp Advocate
Joined: 07 Mar 2007 Posts: 2179
|
Posted: Sun Jun 30, 2024 1:40 pm Post subject: |
|
|
pietinger wrote: | NeddySeagoon wrote: | Try Code: | qgrep cpu_flags_x86_avx2 | to see everywhere in the repo that cpu_flags_x86_avx2 is used. |
If you want see which of your installed packages use e.g. avx2 you can search with:
|
or IIUC:
Code: | equery hasuse cpu_flags_x86_avx2 | for installed packages,
Code: | equery hasuse -p cpu_flags_x86_avx2 | for all packages in portage _________________ Greybeard |
|
Back to top |
|
|
|