Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Rate my optimization choices for safety and overall sanity.
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
stonecraft
n00b
n00b


Joined: 22 Mar 2022
Posts: 19

PostPosted: Wed Apr 16, 2025 6:09 am    Post subject: Rate my optimization choices for safety and overall sanity. Reply with quote

I am planning to install Gentoo on a Threadripper 1950x with 128GB RAM, an AMD video card for main video, and an nvidia card for compute and extra screens, on which I will use KDE/Plasma6. I also regularly edit iphone pictures, which is why I selected heif as a global USE flag. How risky are the optimization flags I have chosen? Do they seem like ones that are likely to help or at least not harm performance?

I don't have time to exhaustively test each one (or even non-exhaustively test them), so my plan is just to enable things that are likely to help and unlikely to hurt and modify my approach based on any obvious problems I encounter.

So here is the `make.conf` I am considering:

Code:

# Global USE flags
USE="nvidia wayland kde plasma qt6 pam pipewire ffmpeg gsettings gstreamer vulkan \
gles2 v4l readline zsh-completion heif openmp postscript python systemd hardened \
-gnome -gtk lto llvm xa opencl cuda"

# Optimized compiler flags for Threadripper 1950X
COMMON_FLAGS="-march=native -O2 -pipe -fno-semantic-interposition -fopenmp \
-flto=32 -fuse-linker-plugin -ftrivial-auto-var-init=zero \
-ftree-vectorize -fvect-cost-model=dynamic \
-floop-unroll-and-jam -floop-interchange -fgraphite-identity -floop-nest-optimize"

CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
RUSTFLAGS="${RUSTFLAGS} -C target-cpu=native -C tune-cpu"

# Linker flags
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--sort-common -Wl,--icf=all"

# Make jobs
MAKEOPTS="-j32"

# General build features
FEATURES="parallel-fetch parallel-install"

# Emerge options
EMERGE_DEFAULT_OPTS="--keep-going --verbose --with-bdeps=y"

# Hardware-specific configuration
VIDEO_CARDS="amdgpu nvidia"
INPUT_DEVICES="libinput"

# Optional: explicitly define portage repo location and build dir
PORTDIR="/var/db/repos/gentoo"
PORTAGE_TMPDIR="/var/tmp/portage"
Back to top
View user's profile Send private message
sam_
Developer
Developer


Joined: 14 Aug 2020
Posts: 2275

PostPosted: Wed Apr 16, 2025 11:27 am    Post subject: Reply with quote

-Wl,--icf=all is unsafe (the other variant is -Wl,--icf=safe, after all!) as it breaks guarantees from the language standard. -C tune-cpu looks invalid, as I think it's supposed to take an argument (but why bother if you have -C target-cpu=XXX anyway in your case, unless you want different values there, but then why native)?

-ftrivial-auto-var-init=zero is not a performance/optimisation flag, but you're free to use it.

-ftree-vectorize is pointless now, it's on by default at -O2, just the cost model varies at -O2 vs -O3.

I wouldn't personally use the graphite flags right now as there's a bunch of untriaged issues that I haven't had time to reduce and it doesn't receive much attention upstream at the moment either.
Back to top
View user's profile Send private message
stonecraft
n00b
n00b


Joined: 22 Mar 2022
Posts: 19

PostPosted: Wed Apr 16, 2025 12:20 pm    Post subject: Reply with quote

Thanks.

* Regarding
Code:
ftrivial-auto-var-init=zero
, I picked it because of stuff I read suggesting that it can make programs more predictable and possibly "harden" them in some security-related way. I'm not sure exactly what that means though, other things I read made me think it would only be useful if I were actually trying to debug something.

* I will drop
Code:
ftree-vectorize
and graphite

As for
Code:
tune-cpu
, [the documentation linked to in the handbook](https://doc.rust-lang.org/rustc/codegen-options/index.html) made me think it might do something additional (although it also sounds risky).

Quote:
This instructs rustc to schedule code specifically for a particular processor. This does not affect the compatibility (instruction sets or ABI), but should make your code slightly more efficient on the selected CPU.

The valid options are the same as those for target-cpu. The default is None, which LLVM translates as the target-cpu.

This is an unstable option. Use -Z tune-cpu=machine to specify a value.

Due to limitations in LLVM (12.0.0-git9218f92), this option is currently effective only for x86 targets.


Last edited by stonecraft on Wed Apr 16, 2025 8:38 pm; edited 1 time in total
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23355

PostPosted: Wed Apr 16, 2025 2:05 pm    Post subject: Reply with quote

stonecraft wrote:
* Regarding `ftrivial-auto-var-init=zero`, I picked it because of stuff I read suggesting that it can make programs more predictable and possibly "harden" them in some security-related way. I'm not sure exactly what that means though, other things I read made me think it would only be useful if I were actually trying to debug something.
Local variables with no explicit initializer are normally left uninitialized. Reading an uninitialized value is undefined behavior, and in practice such a read returns whatever garbage was in that memory location before the variable's lifetime began. -ftrivial-auto-var-init=zero instructs the compiler to implicitly initialize the value to 0, so an uninitialized read will return that implicit 0 instead of the undefined garbage. Such an uninitialized read is still wrong, but now has predictable behavior.
stonecraft wrote:
As for `tune-cpu`, [the documentation linked to in the handbook](https://doc.rust-lang.org/rustc/codegen-options/index.html)
This forum uses BBCode, not Markdown.

To sam_'s point, even if -C tune-cpu were useful to you (and it seems like it is not), your usage is ill-formed, because you should have written -C tune-cpu=some-valid-CPU-type.
stonecraft wrote:
(although it also sounds risky)
No, it is not risky. It is merely pointless. In C, and presumably in Rust as well, target-cpu sets a minimum assumed CPU type, which gives the compiler permission to use features that are new in that CPU generation, and which will simply crash if run on an older system. Again going to C, -march=corei7 enables instructions that are present on Intel i7 series CPUs, but not present on the earlier Intel Core 2 design. Such a program might (or might not) use those new instructions. If it does, and you try to run it on an old Intel Core 2 CPU, you will get an "Illegal instruction" exception when you hit an instruction that is new in the Intel i7 line. The same general comments (likely) apply to Rust, though it might use different spellings for the CPU families.

By contrast, tune-cpu (gcc name: -mtune) tells the compiler to expect the named CPU type, but not rely on it. For example, a program built with -march=core2 -mtune=corei7 must run correctly on the older Intel Core 2 line, but the compiler might arrange the code to work better on an Intel i7 than on an Intel Core 2, perhaps by preferring instructions that existed in Core 2, but got notably faster on i7, or by choosing alignment layouts that result in suboptimal (but not wrong) cache usage on a Core 2 and optimal cache usage on an i7.
Back to top
View user's profile Send private message
stonecraft
n00b
n00b


Joined: 22 Mar 2022
Posts: 19

PostPosted: Wed Apr 16, 2025 8:37 pm    Post subject: Reply with quote

Quote:
Local variables with no explicit initializer are normally left uninitialized. Reading an uninitialized value is undefined behavior, and in practice such a read returns whatever garbage was in that memory location before the variable's lifetime began. -ftrivial-auto-var-init=zero instructs the compiler to implicitly initialize the value to 0, so an uninitialized read will return that implicit 0 instead of the undefined garbage. Such an uninitialized read is still wrong, but now has predictable behavior.


I'm not sure that will ever be useful to me, but in general things being more predictable is good, so I suppose I will keep that one.



Quote:

By contrast, tune-cpu (gcc name: -mtune) tells the compiler to expect the named CPU type, but not rely on it. For example, a program built with -march=core2 -mtune=corei7 must run correctly on the older Intel Core 2 line, but the compiler might arrange the code to work better on an Intel i7 than on an Intel Core 2, perhaps by preferring instructions that existed in Core 2, but got notably faster on i7, or by choosing alignment layouts that result in suboptimal (but not wrong) cache usage on a Core 2 and optimal cache usage on an i7.


Thanks for the clarification, that is exactly what I needed to know and was not clear to me based on the documentation


Code:
This forum uses BBCode, not Markdown.


Sorry, force of habit. Now corrected.


Anyway, based on the kind advice given to me, this is my current plan for make.conf:

Code:

# Global USE flags
USE="nvidia wayland kde plasma qt6 pam pipewire ffmpeg gsettings gstreamer vulkan \
gles2 v4l readline zsh-completion heif openmp postscript python systemd hardened \
-gnome -gtk lto llvm xa opencl cuda"

# Optimized compiler flags for Threadripper 1950X
COMMON_FLAGS="-march=native -O2 -pipe -fno-semantic-interposition -fopenmp \
-flto=32 -fuse-linker-plugin -ftrivial-auto-var-init=zero \
-fvect-cost-model=dynamic -floop-unroll-and-jam \
-floop-interchange -floop-nest-optimize"

CFLAGS="${COMMON_FLAGS}"
CXXFLAGS="${COMMON_FLAGS}"
FCFLAGS="${COMMON_FLAGS}"
FFLAGS="${COMMON_FLAGS}"
RUSTFLAGS="${RUSTFLAGS} -C target-cpu=native"

# Linker flags
LDFLAGS="-Wl,-O1 -Wl,--as-needed -Wl,--sort-common"

# Make jobs
MAKEOPTS="-j32"

# General build features
FEATURES="parallel-fetch parallel-install"

# Emerge options
EMERGE_DEFAULT_OPTS="--keep-going --verbose --with-bdeps=y"

# Hardware-specific configuration
VIDEO_CARDS="amdgpu nvidia"
INPUT_DEVICES="libinput"

# Optional: explicitly define portage repo location and build dir
PORTDIR="/var/db/repos/gentoo"
PORTAGE_TMPDIR="/var/tmp/portage"

Back to top
View user's profile Send private message
user
Apprentice
Apprentice


Joined: 08 Feb 2004
Posts: 231

PostPosted: Thu Apr 17, 2025 8:17 am    Post subject: Reply with quote

sam_ wrote:
-ftree-vectorize is pointless now, it's on by default at -O2, just the cost model varies at -O2 vs -O3.

Which gcc version?
Code:
# gcc -O2 -Q --help=common | grep ftree-vectorize
  -ftree-vectorize                      [disabled]
Back to top
View user's profile Send private message
Josef.95
Advocate
Advocate


Joined: 03 Sep 2007
Posts: 4739
Location: Germany

PostPosted: Thu Apr 17, 2025 9:45 am    Post subject: Reply with quote

I think set PORTAGE_TMPDIR="/var/tmp/portage" is not ideal.
portage appends portage to default PORTAGE_TMPDIR="/var/tmp/
so it is then PORTAGE_TMPDIR=/var/tmp/portage by default :)

Example:
# PORTAGE_TMPDIR="/var/tmp/portage" ebuild `equery w nano` unpack
 * nano-8.4.tar.xz BLAKE2B SHA512 size ;-) ...                                                                                                                                        [ ok ]
>>> Unpacking source...
>>> Unpacking nano-8.4.tar.xz to /var/tmp/portage/portage/app-editors/nano-8.4/work
>>> Source unpacked in /var/tmp/portage/portage/app-editors/nano-8.4/work

So, remove PORTAGE_TMPDIR="/var/tmp/portage" from make.conf, and using the sane default should be fine.
Back to top
View user's profile Send private message
sam_
Developer
Developer


Joined: 14 Aug 2020
Posts: 2275

PostPosted: Thu Apr 17, 2025 12:52 pm    Post subject: Reply with quote

user wrote:
sam_ wrote:
-ftree-vectorize is pointless now, it's on by default at -O2, just the cost model varies at -O2 vs -O3.

Which gcc version?
Code:
# gcc -O2 -Q --help=common | grep ftree-vectorize
  -ftree-vectorize                      [disabled]


Since r12-4240-g2b8453c401b699, it's enabled by default (so >= GCC 12). I don't know why the --help output is misleading, I'll file a bug.

EDIT: Filed https://gcc.gnu.org/PR119851
Back to top
View user's profile Send private message
sam_
Developer
Developer


Joined: 14 Aug 2020
Posts: 2275

PostPosted: Thu Apr 17, 2025 3:26 pm    Post subject: Reply with quote

I forgot to say: -fuse-linker-plugin is of no use here. It's already the default when GCC is built with modern Binutils. You'd also notice if it wasn't working as you'd either need to use -ffat-lto-objects (I think) or get far slower builds and sometimes link errors (as the non-plugin path is not supported well).
Back to top
View user's profile Send private message
user
Apprentice
Apprentice


Joined: 08 Feb 2004
Posts: 231

PostPosted: Sat Apr 19, 2025 12:19 pm    Post subject: Reply with quote

sam_ wrote:
-ftree-vectorize is pointless now, it's on by default at -O2, just the cost model varies at -O2 vs -O3.

Since r12-4240-g2b8453c401b699, it's enabled by default (so >= GCC 12). I don't know why the --help output is misleading, I'll file a bug.

EDIT: Filed https://gcc.gnu.org/PR119851


Thanks sam for clarifying it. How about:
Code:
COMMON_FLAGS="-march=native"
COMMON_FLAGS="${COMMON_FLAGS} -O2 -pipe"

COMMON_FLAGS="${COMMON_FLAGS} -malign-data=cacheline"     # ‘compat’ default
COMMON_FLAGS="${COMMON_FLAGS} -mtls-dialect=gnu2"         # ‘gnu’ is the conservative default; ‘gnu2’ is more efficient

COMMON_FLAGS="${COMMON_FLAGS} -fno-plt"                   # to get more efficient asm, especially for 64-bit mode where even PIE code can efficiently reference the GOT directly

COMMON_FLAGS="${COMMON_FLAGS} -flto=jobserver"            # LTO-GCC: use GNU make’s job server or otherwise fall back to autodetection of the number of CPU threads present in your system
COMMON_FLAGS="${COMMON_FLAGS} -flto-partition=one"        # LTO: achieve maximum performance potential (default ‘balanced‘)
COMMON_FLAGS="${COMMON_FLAGS} -fdevirtualize-at-ltrans"   # LTO: perform devirtualization across object file boundaries using LTO
COMMON_FLAGS="${COMMON_FLAGS} -Werror=odr -Werror=lto-type-mismatch -Werror=strict-aliasing" # LTO: indicate likely runtime problems with LTO

COMMON_FLAGS="${COMMON_FLAGS} -fdata-sections"            # LINKER: Keeps data in separate data sections, so they can be discarded if unused (for later -Wl,--gc-sections)
COMMON_FLAGS="${COMMON_FLAGS} -ffunction-sections"        # LINKER: Keeps funcitons in separate data sections, so they can be discarded if unused (for later -Wl,--gc-sections)
Back to top
View user's profile Send private message
Slippery Jim
Apprentice
Apprentice


Joined: 08 Jan 2005
Posts: 292

PostPosted: Mon Apr 21, 2025 12:56 pm    Post subject: Reply with quote

I'll comment on the emerge and make options, in case you also want to optimize your build process too:

You have -j32 in MAKEOPTS. I would add -j32 to EMERGE_DEFAULT_OPTS as well, and I would add -l28.8 to both (90% of -j32), to limit the system load average, and leave a bit of processing power for interface responsiveness.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 23355

PostPosted: Mon Apr 21, 2025 1:52 pm    Post subject: Reply with quote

I would recommend against setting both MAKEOPTS and EMERGE_DEFAULT_OPTS to use -j32, since this could spawn up to 32 packages each using up to 32 jobs within the package. Not all build systems understand or respect --load-average, and even those that do respect it may not be fully constrained by it, since load average ramps up as the started jobs begin executing, so you could end up with 32*32 = 1024 jobs running. Very few people have the hardware to tolerate running that many jobs at once.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum