Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How much difference does CPU optimization make in practice?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo
View previous topic :: View next topic  
Author Message
lars_the_bear
Guru
Guru


Joined: 05 Jun 2024
Posts: 522

PostPosted: Sat Jun 29, 2024 7:16 am    Post subject: How much difference does CPU optimization make in practice? Reply with quote

Hi folks

While I appreciate that building software with optimizations specific to the CPU offers theoretical advantages, I'm wondering how much benefit this has in practice. The users of mainstream binary distributions don't have this feature and, on the whole, they don't seem too concerned about it.

I've complained elsewhere how long compiling takes on the kind of hardware I have (of course, I'm not the first person to do so); I'm hoping to be able to compile once for a bunch of machines that are similar, but not identical.

Does anybody have a feel for what is lost by not compiling for the specific CPU? Or what kinds of application are more affected? FWIW I'm not interested in games, but I do run 3D design stuff, and a load of photo/video editing.

BR, Lars.
Back to top
View user's profile Send private message
carcajou
Apprentice
Apprentice


Joined: 10 Jun 2008
Posts: 248

PostPosted: Sat Jun 29, 2024 8:13 am    Post subject: Reply with quote

Personally, I did not perform any benchamarks, but last 3 updates I was using mostly official binaries. Did not notice any difference. I do not think that in the era of fast RAM memory and SSDs these differences are that much noticeable.

If differences actually exist, I am positive you can see them in multimedia processing applications, CPU intensive long-lasting tasks and similar. Not sure if the performance gain is actually worth of compilation time.
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2179

PostPosted: Sat Jun 29, 2024 9:29 am    Post subject: Reply with quote

IMHO, in general the answer will be "not much", but for specific cases it will be "a lot". For example, there's a package (I forget which) that can exploit hardware crypto functions, if available, that's selected by a USE flag. If it's available and you're running a web server, it's likely to have a noticeable effect. Similar things will be true about the latest vector instructions if your workload includes some particular classes of heavy computation (and that includes AI workloads). And soon there will be further significant advantages to exploiting AI functions in hardware.
_________________
Greybeard
Back to top
View user's profile Send private message
CaptainBlood
Advocate
Advocate


Joined: 24 Jan 2010
Posts: 3864

PostPosted: Sat Jun 29, 2024 10:30 am    Post subject: Reply with quote

binary size is likely to be shrinked in most cases, implying with ram lower usage.
When short on RAM, this may save from slowdown due to swap usage.
These are corner cases though.

Choice of compiler may benefit performance is some cases:
Phoronix gcc 14 vs llvm 18

Unfortunately most package devs don't publish benchmarks in this regard.

Thks 4 ur attention, interest & support.
_________________
USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. "
Back to top
View user's profile Send private message
kurdy
n00b
n00b


Joined: 29 Dec 2023
Posts: 6
Location: Switzerland

PostPosted: Sat Jun 29, 2024 11:24 am    Post subject: Reply with quote

Hello,

I did some tests using the same machine almost the same kde once with Manjaro and after with Gentoo.

I used unixbench and blender, I also did some tests using compression tools, videos ffmpeg, Rust compile, openssl. All show a gain between 2 to 12% but with a large variability.

Unixbench shows 2 tests with huge gain:

Pipe-based Context Switching
This test measures the number of times two processes can exchange an increasing integer through a pipe. The pipe-based context switching test is more like a real-world application. The test program spawns a child process with which it carries on a bi-directional pipe conversation.

System Call Overhead
This estimates the cost of entering and leaving the operating system kernel, i.e., the overhead for performing a system call. It consists of a simple program repeatedly calling the getpid (which returns the process id of the calling process) system call. The time to execute such calls is used to estimate the cost of entering and exiting the kernel.

this corroborates a feeling of use Gentoo seems to me to be more responsive overall. The 5 to 10 percent gain on tasks such as compiling, video conversion, compression etc... doesn't change my life, but overall today I'm willing to pay the price of a little Gentoo complexity for it. Note that I haven't migrated to my work machine, because I don't want to wait or spend time for this complexity.


Below I share some results of unixbench as images. Blender result is globally the same.

View results Graph 1

View results Graph 2

Regards


Last edited by kurdy on Sun Jun 30, 2024 8:10 am; edited 1 time in total
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54578
Location: 56N 3W

PostPosted: Sat Jun 29, 2024 3:57 pm    Post subject: Reply with quote

lars_the_bear,

Multimedia and crypto will see improvements if your CPU supports the required instructions.
The original Intel mme extensions were aimed at multimedia, as were 3dNow! from AMD.

libre-office will spend more time waiting for your keystrokes. :)

In short, generics are fine for most things. Where it matters, you may want to build custom packages.

Binary distro users are happy as they can't tell the difference. :)
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
lars_the_bear
Guru
Guru


Joined: 05 Jun 2024
Posts: 522

PostPosted: Sat Jun 29, 2024 6:19 pm    Post subject: Reply with quote

HIi

Thanks to all who replied. The impression I get is that CPU-specific optimization is only going to make a noticeable difference for very specific applications.

But I'm still confused...

If I don't specify anything for '-march=...', I think I get the generic x86-64 by default. The compiler won't (IIUC) generate code that uses any of the later extensions, like MMX and AVX. But I did what the Gentoo installation documentation said to do, which was to use `cpuid2cpuflags` to generate `package.use/00cpu-flags`. These flags contain `avx`, `mmx`, and a bunch of others.

Is it a problem if these USE flags are not in agreement with the gcc -march setting?

BR, Lars.
Back to top
View user's profile Send private message
Anon-E-moose
Watchman
Watchman


Joined: 23 May 2008
Posts: 6148
Location: Dallas area

PostPosted: Sat Jun 29, 2024 8:02 pm    Post subject: Reply with quote

cpu optimizations were really helpful back in the Pentium* days, less so now at least for modern systems.

For old systems it can still play a part, but typically old systems go from general purpose to single purpose, ie file server, music server, etc
and there's not much need for cpu optimizations in these cases.
_________________
UM780, 6.1 zen kernel, gcc 13, profile 17.0 (custom bare multilib), openrc, wayland
Back to top
View user's profile Send private message
lars_the_bear
Guru
Guru


Joined: 05 Jun 2024
Posts: 522

PostPosted: Sun Jun 30, 2024 7:37 am    Post subject: Reply with quote

Anon-E-moose wrote:
...but typically old systems go from general purpose to single purpose, ie file server, music server, etc


Fair enough; but I'm specifically trying to see whether my 2012-2015 Lenovo laptops will serve for daily, general-purpose use.

These were top-of-the-range systems back in their day, and they still have a lot (in my view) to recommend them. They all have user-replaceable batteries, multiple SSDs, tool-free dismantling, desktop docking stations, hot-swappable storage, and a bunch of other things that don't seem to exist today, except at ruinous prices.

My old laptops do run things like FreeCAD and Darktable with just-about-acceptable performance. Only just, though. From the above, I don't get the impression that fiddling with CPU optimization will make a radical difference.

BR, Lars.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54578
Location: 56N 3W

PostPosted: Sun Jun 30, 2024 10:22 am    Post subject: Reply with quote

lars_the_bear,

-march and cpuid2cpuflags (package.use/00cpu-flags) do two different things.

As you say, generic x86-64 is what you get by default when -march is not given. In includes a few CFLAGS that all amd64 CPUs have.
CFLAGS give compilers permission to use the instruction set extensions listed. They are not compelled to.
When you use the wrong -march, sometimes the binary works, sometimes you get an illegal instruction exception. It all depends what the compiler actually did.

cpuid2cpuflags sets a USE_EXPAND to allow build systems to take advantage of optional optimised code segments.
They are not used by compilers.

I have
Code:
CPU_FLAGS_X86="aes avx avx2 f16c fma3 mmx mmxext pclmul popcnt rdrand sha sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3"

Try
Code:
qgrep cpu_flags_x86_avx2
to see everywhere in the repo that cpu_flags_x86_avx2 is used.
This is an instruction (not a permission) to build systems.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
pietinger
Moderator
Moderator


Joined: 17 Oct 2006
Posts: 5117
Location: Bavaria

PostPosted: Sun Jun 30, 2024 10:30 am    Post subject: Reply with quote

NeddySeagoon wrote:
Try
Code:
qgrep cpu_flags_x86_avx2
to see everywhere in the repo that cpu_flags_x86_avx2 is used.

If you want see which of your installed packages use e.g. avx2 you can search with:
Code:
eix -I -y -U avx2

_________________
https://wiki.gentoo.org/wiki/User:Pietinger
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54578
Location: 56N 3W

PostPosted: Sun Jun 30, 2024 11:55 am    Post subject: Reply with quote

pietinger,

Thank you.
eix is huge. That's a new corner of it for me.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2179

PostPosted: Sun Jun 30, 2024 1:40 pm    Post subject: Reply with quote

pietinger wrote:
NeddySeagoon wrote:
Try
Code:
qgrep cpu_flags_x86_avx2
to see everywhere in the repo that cpu_flags_x86_avx2 is used.

If you want see which of your installed packages use e.g. avx2 you can search with:
Code:
eix -I -y -U avx2

or IIUC:
Code:
equery hasuse cpu_flags_x86_avx2
for installed packages,
Code:
equery hasuse -p cpu_flags_x86_avx2
for all packages in portage
_________________
Greybeard
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Other Things Gentoo All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum