View previous topic :: View next topic |
Author |
Message |
jesnow l33t
Joined: 26 Apr 2006 Posts: 885
|
Posted: Sat Mar 12, 2022 5:03 pm Post subject: |
|
|
Currently these are the compiler flags in use for clang:
Code: |
bartali /etc/portage # cat env/no-nosinter
# CFLAGS="-Ofast -march=x86-64 -mtune=generic -mfpmath=both -pipe -flto=8 -fdevirtualize-at-ltrans -fgraphite-identity -floop-nest-optimize -funroll-loops -fipa-pta -ftracer -fno-plt -malign-data=cacheline -mtls-dialect=gnu2 -Wl,--hash-style=gnu"
CFLAGS="-Ofast -march=x86-64 -mtune=generic -mfpmath=both -pipe -flto=8 -fdevirtualize-at-ltrans -fgraphite-identity -floop-nest-optimize -funroll-loops -ftracer -malign-data=cacheline -mtls-dialect=gnu2 -Wl,--hash-style=gnu"
CXXFLAGS="${CFLAGS}"
|
I can narrow down the optimizations still more. I guess that's the next step. I'm also wondering why the abi_x86_32 useflag is everywhere, that's hardly necessary anymore. I think cloveros was all about da gamez and many of them had to be 32 bits?
The cool thing about cloveros was it showed how to do a lot of custom optimizations within a strictly gentoo system. He apparently went and turned everything on and tuned down from there, using package.env and env/* to get things to run. I really learned a lot from it but obviously there is something I've missed. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20511
|
Posted: Sat Mar 12, 2022 5:10 pm Post subject: |
|
|
jesnow wrote: | Short answer, no it's gentoo. There is no non-gentoo code on here that I'm aware of
I've been compiling from gentoo repositories from day 1, almost 2 years ago, including clang. "Use a binhosted distro as an easy gentoo installer" they said. And it's true, it was easy at first. And it was even then fully gentoo, just a ton of cute optimizations. I was going back and forth between binhost and gentoo repositories for a while, but went full gentoo with emerge -e world long ago. That worked, this is gentoo installation. Cloveros binhost hasn't even been available for a while. | Whether or not the binhost has been gone for a long time does not guarantee that you don't still have some non-Gentoo 'cute optimizations' causing problems. The install was not Gentoo to begin with, so what remains is whether or not emerge -e world replaced everything that wasn't Gentoo. Since you specify world and not system, that could be part of the problem.
Is any Gentoo user experiencing this problem? If not, it is difficult to eliminate what your install was as a potential part of the problem.
jesnow wrote: | This is for sure a stupid compiler optimization that will go away when I fix the compile options, but I am out of guesses what the offending one is. | What happens if you use default Gentoo settings? You may need to compile it twice to get a clean result. Build with the defaults, then use that compiler to build the final compiler. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22846
|
Posted: Sat Mar 12, 2022 5:27 pm Post subject: |
|
|
The advice to add more memory is for people who are plagued by swapping due to insufficient memory relative to the number of programs they have open. They run 8+ jobs in parallel, run out of memory, start swapping, and all 8+ jobs run slowly as the kernel swaps in and out pieces of each of them. As I understand your problem report, you already knew that was not your failure mode. Your failure mode is that the system stops trying to run multiple jobs in parallel. We still don't have any indication why it decides to go to running one process at a time with no parallelism, though I have asked a few times for information that might lead us in that direction. |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 885
|
Posted: Sat Mar 12, 2022 5:36 pm Post subject: |
|
|
I noticed the following: (Sorry for the multiple posting, this is a new subtopic. )
Registry.cpp is where the compilations of the clang compiler stall out single threaded. And what stalls out? cc1plus, the preprocessor. So lets just go look at the code, maybe we'll get a clue:
https://clang.llvm.org/doxygen/Registry_8cpp_source.html
So this is very interesting: It defines a bunch of preprocessor macros, but the most important one is this one:
Code: |
void RegistryMaps::registerMatcher(
62 StringRef MatcherName, std::unique_ptr<MatcherDescriptor> Callback) {
63 assert(Constructors.find(MatcherName) == Constructors.end());
64 Constructors[MatcherName] = std::move(Callback);
65 }
#define REGISTER_MATCHER(name) \
registerMatcher(#name, internal::makeMatcherAutoMarshall( \
::clang::ast_matchers::name, #name));
|
It looks like the constructor for the registerMatcher class ( I love how the line numbers come through, are there some gotos? Just joking.) Just at a guess it's building up a big damn strong lookup table of all the keywords and register classes to them so it can have code to run when it finds each one in the source code it's processing. I'm not a compiler developer, but it kind of looks like it. Better than the equivalent if ... else if ... else if ... else or switch (case) for sure.
And this macro is called about 500 times, like this:
Code: |
132 REGISTER_MATCHER(addrLabelExpr);
133 REGISTER_MATCHER(alignOfExpr);
134 REGISTER_MATCHER(allOf);
135 REGISTER_MATCHER(anyOf);
136 REGISTER_MATCHER(anything);
137 REGISTER_MATCHER(argumentCountIs);
138 REGISTER_MATCHER(arraySubscriptExpr);
139 REGISTER_MATCHER(arrayType);
140 REGISTER_MATCHER(asString);
141 REGISTER_MATCHER(asmStmt);
142 REGISTER_MATCHER(atomicExpr);
143 REGISTER_MATCHER(atomicType);
144 REGISTER_MATCHER(attr);
145 REGISTER_MATCHER(autoType);
146 REGISTER_MATCHER(autoreleasePoolStmt)
147 REGISTER_MATCHER(binaryConditionalOperator);
148 REGISTER_MATCHER(binaryOperator);
149 REGISTER_MATCHER(binaryOperation);
...
587 REGISTER_MATCHER(valueDecl);
588 REGISTER_MATCHER(varDecl);
589 REGISTER_MATCHER(variableArrayType);
590 REGISTER_MATCHER(voidType);
591 REGISTER_MATCHER(whileStmt);
592 REGISTER_MATCHER(withInitializer);
|
So I think maybe what the compiler is doing is running the c preprocessor single threaded on this file (makes sense) and this is just one massive file full of macros that have to be unrolled before it can proceed to compile. And it takes a minute, or maybe 8 hours to do that. Maybe there is a compiler flag that's causing the preprocessor to not fork lots of threads to do this? Or makes it take longer?
This file must be the absolute heart of the clang compiler, so it's pretty damn cool to have a peek inside the code of a compiler. Though I wish I wasn't doing it.
Cheers,
Jon. |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 885
|
Posted: Sat Mar 12, 2022 5:45 pm Post subject: |
|
|
Hi Hu, you're quite right, I had indeed more or less eliminated insufficient memory as the problem, so I kind of knew the pile of extra memory wasn't likely to solve this particular issue. I had noticed that the machine swapped a little bit before settling down to single thread mode, and so it's not awful to have way too much memory in a gentoo machine. At least I can eliminate that bottleneck. I *like* running gentoo even though it's super-unproductive for the rest of my life. My son the developer tells me its a kind of pointless auto-gratification to spend so much time on compiling my operating system every weekend.
Hu wrote: | The advice to add more memory is for people who are plagued by swapping due to insufficient memory relative to the number of programs they have open. They run 8+ jobs in parallel, run out of memory, start swapping, and all 8+ jobs run slowly as the kernel swaps in and out pieces of each of them. As I understand your problem report, you already knew that was not your failure mode. Your failure mode is that the system stops trying to run multiple jobs in parallel. We still don't have any indication why it decides to go to running one process at a time with no parallelism, though I have asked a few times for information that might lead us in that direction. |
Last edited by jesnow on Sat Mar 12, 2022 6:35 pm; edited 1 time in total |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 885
|
Posted: Sat Mar 12, 2022 6:10 pm Post subject: |
|
|
Ok found it. Marking this [solved], thanks to all that helped. People shouldn't hate on cloveros so much. It was always gentoo through and through, just kind of a really over optimized one. Unlike lots of other gentoo derivatives that put a lot of their own code in, This was just gentoo with a complicated set of /etc/portage files, a binhost and a nicer install script than the standard one.
Anyway:
Cloveros had compiled clang with his "no-nosinter" compile flags:
Code: |
bartali /etc/portage # grep sys-devel/clang *
grep: backups: Is a directory
grep: env: Is a directory
grep: make.profile: Is a directory
package.env:sys-devel/clang clang no-nosinter no-lto
bartali /etc/portage # cat /etc/portage/env/no-nosinter
# CFLAGS="-Ofast -march=x86-64 -mtune=generic -mfpmath=both -pipe -flto=8 -fdevirtualize-at-ltrans -fgraphite-identity -floop-nest-optimize -funroll-loops -fipa-pta -ftracer -fno-plt -malign-data=cacheline -mtls-dialect=gnu2 -Wl,--hash-style=gnu"
CFLAGS="-Ofast -march=x86-64 -mtune=generic -mfpmath=both -pipe -flto=8 -fdevirtualize-at-ltrans -fgraphite-identity -floop-nest-optimize -funroll-loops -ftracer -malign-data=cacheline -mtls-dialect=gnu2 -Wl,--hash-style=gnu"
CXXFLAGS="${CFLAGS}"
|
Actually now that I look at it, I bet the no-lto flags took precedence, there should not be two sets specified:
Code: |
bartali /etc/portage # cat /etc/portage/env/no-lto
CFLAGS="-Ofast -march=x86-64 -mtune=generic -mfpmath=both -pipe -fgraphite-identity -floop-nest-optimize -funroll-loops -fipa-pta -ftracer -fno-plt -fno-semantic-interposition -malign-data=cacheline -mtls-dialect=gnu2 -Wl,--hash-style=gnu"
CXXFLAGS="${CFLAGS}"
|
Anyway, exactly as pjp said, I created a more generic compile environment just for clang:
Code: |
bartali /etc/portage # grep sys-devel/clang *
grep: backups: Is a directory
grep: env: Is a directory
grep: make.profile: Is a directory
package.env:sys-devel/clang clang
bartali /etc/portage # cat /etc/portage/env/clang
CFLAGS="-Ofast -march=x86-64 -mtune=generic -pipe -flto=8 -funroll-loops -Wl,--hash-style=gnu"
CXXFLAGS="${CFLAGS}"
FEATURES="-ccache -distcc"
EMERGE_DEFAULT_OPTS=" --jobs 10 --load-average 20"
MAKEOPTS="-j15 -l12"
|
Lo and behold, it worked:
Code: |
bartali /home/jesnow # time emerge -1 clang
Calculating dependencies... done!
>>> Verifying ebuild manifests
>>> Emerging (1 of 1) sys-devel/clang-13.0.1::gentoo
>>> Installing (1 of 1) sys-devel/clang-13.0.1::gentoo
>>> Jobs: 1 of 1 complete Load avg: 7.3, 10.1, 10.1
* Messages for package sys-devel/clang-13.0.1:
* You can find additional utility scripts in:
* /usr/lib/llvm/13/share/clang
* Some of them are vim integration scripts (with instructions inside).
* The run-clang-tidy.py script requires the following additional package:
* dev-python/pyyaml
>>> Auto-cleaning packages...
>>> No outdated packages were found on your system.
* GNU info directory index is up-to-date.
real 37m40.667s
user 342m11.632s
sys 9m32.885s
|
I can now maybe turn a couple things back on if I care, but the maddening part is found. Interestingly, at no time did the memory usage exceed 11G (even with all cores running). If I find exactly which of those compiler optimizations caused the problem I will follow up. If anybody cares and has an idea which ones are useful to test please let me know.
Thanks once again to Hu, eccerr0r and pjp.
Update: I'm now completing the bootstrapping of clang with clang. Makes sense that it would work better that way.
Clang compiled with clang:
Code: |
real 26m22.252s
user 257m24.207s
sys 5m17.198s
|
Cheers,
Jon. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9842 Location: almost Mile High in the USA
|
Posted: Sat Mar 12, 2022 8:05 pm Post subject: |
|
|
Well the point is, cloveros added something you didn't add explicitly, else you would have remembered, I would hope... or else you'd basically have Gentoo with another set of instructions, which you also should have remembered. So that environment file must have came from cloveros however it was distributed - which is not supported by Gentoo proper. There is no way I could have acquired a copy of those files from the Gentoo repositories or the installation wiki - hence my concern this not being a supported configuration, even if the base system is still Gentoo.
In any case glad you found your problem. Most likely it's solely due to turning on LTO whether inadvertently or not, I don't think graphite or other options kills performance that much but worth testing anyway if you're still trying to squeeze performance out. But now your ryzen beats my 10+ year old core2 quad as it should :) _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20511
|
Posted: Sat Mar 12, 2022 10:18 pm Post subject: |
|
|
I'm glad you figured it out.
jesnow wrote: | People shouldn't hate on cloveros so much. It was always gentoo through and through, just kind of a really over optimized one. Unlike lots of other gentoo derivatives that put a lot of their own code in, This was just gentoo with a complicated set of /etc/portage files, a binhost and a nicer install script than the standard one. | Don't misunderstand clarifying points as hate. It wasn't Gentoo. I remember hearing about it, but don't remember much about it. If optimizations were its reason for existing, then that's probably why I didn't keep up with it. That and new distros often seem to be more hobby than meaningful project that can be relied upon -- I'd like to build, but I probably won't be able to maintain it. I'm not speaking for developers, but I believe "eager" optimizations aren't something supported. If you had come up with all over cloveros' optimizations on your own, there might have been a point where they told you to turn them off to see if the problem was repeatable.
jesnow wrote: | Cloveros had compiled clang with his "no-nosinter" compile flags: | I misread that as no-nosinister. I was disappointed to find out I misread it. :) _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 885
|
Posted: Sun Mar 13, 2022 12:34 am Post subject: |
|
|
OK, I didn't really mean "hate", but in fact everything cloveros did was "in bounds" for a gentoo distribution. And If I can just learn how to control it all, which I am, it will improve my gentoo skills. Which have improved as a result.
For example, with distcc turned on (but mt /etc/distcc/hosts needs adjusting):
All of my distcc running on various boxes around the house are Calculate linux, which is much more vanilla. Only my server and my workstation are still compiled gentoo.
Code: |
real 20m7.478s
user 74m12.204s
sys 2m32.901s
|
and with ccache, of course:
Code: |
real 1m38.372s
user 3m22.226s
sys 1m5.839s
|
Thanks again everybody.
Cheers,
Jon. |
|
Back to top |
|
|
lefsha Veteran
Joined: 30 Aug 2004 Posts: 1235 Location: Burgas, Bulgaria
|
Posted: Thu Aug 17, 2023 8:12 pm Post subject: |
|
|
I don't know how much the issue has been solved, but I do have exact the same issue.
I do see no solution for that. And I was not able to find any solution in that thread.
Anyone who does really believe there was a solution I have missed, please point me to that text.
Thx.
The issue so far is only existing at GCC 12 and 13. I didn't check all 13.X versions, but only
one for each major version.
The issue doesn't exist for GCC-14 as well it doesn't exist for ANY version of CLANG.
Obviously, not settings, no complier flags no nothing can be a reason. Something is wrong
inside of GCC or the way it is compiling for Gentoo. I didn't check other versions of GCC from Debian
neither I have compiled them manually. I did so with CLANG. Both ebuild version of CLANG and
manually made one are free of that issue.
I recall the issue is, that after starting the 16 jobs in my case with -j16, after certain period
of time by reaching 98-99% of compilation the cc1plus process became single threaded
and steadily growing in memory consumption, sometimes reaching 40-60%.
It can take 1-2 days to compile clang by gcc. It can take 1-2 days to compile gcc by gcc.
Some other bulky software (vtk etc) has the same issue. In all those cases the compilation
is successful! It just takes terribly long and uses only 1 core of 16 cores.
It's not a linking phase, so I cannot understand what compiler can do that long with text files.
Switching to another gcc version like gcc-14 or moving to clang solves the issues.
I cannot claim that this issues is really solved like OP reported.
I have seen similar reports on reddit.
I will maybe try to compile gcc-12 manually and see the difference.
So far it is very annoying. Never before I have experienced something like that.
Unfortunately I cannot move to gcc-14 yet a lot of software still depends in gcc-12. _________________ Lefsha |
|
Back to top |
|
|
stefan11111 l33t
Joined: 29 Jan 2023 Posts: 935 Location: Romania
|
Posted: Thu Aug 17, 2023 8:27 pm Post subject: Re: emerge -1 clang takes 14 hours? [SOLVED!!] |
|
|
jesnow wrote: |
Code: |
sys-devel/clang-13.0.0::gentoo was built with the following:
USE="static-analyzer xml -debug -default-compiler-rt -default-libcxx -default-lld -doc -llvm-libunwind -test" ABI_X86="32 (64) (-x32)" LLVM_TARGETS="NVPTX (X86) -AArch64 -AMDGPU (-ARC) -ARM -AVR -BPF (-CSKY) -Hexagon -Lanai (-M68k) -MSP430 -Mips -PowerPC -RISCV -Sparc -SystemZ (-VE) -WebAssembly -XCore" PYTHON_SINGLE_TARGET="python3_9 -python3_10 -python3_8"
CFLAGS="-Ofast -march=x86-64 -mtune=generic -mfpmath=both -pipe -fgraphite-identity -floop-nest-optimize -funroll-loops -fipa-pta -ftracer -fno-plt -fno-semantic-interposition -malign-data=cacheline -mtls-dialect=gnu2 -Wl,--hash-style=gnu"
CXXFLAGS="-Ofast -march=x86-64 -mtune=generic -mfpmath=both -pipe -fgraphite-identity -floop-nest-optimize -funroll-loops -fipa-pta -ftracer -fno-plt -fno-semantic-interposition -malign-data=cacheline -mtls-dialect=gnu2 -Wl,--hash-style=gnu"
FEATURES="network-sandbox binpkg-logs protect-owned qa-unresolved-soname-deps sandbox news fixlafiles userfetch config-protect-if-modified merge-sync distcc userpriv assume-digests pid-sandbox sfperms unmerge-orphans usersandbox unknown-features-warn distlocks unmerge-logs xattr ipc-sandbox parallel-fetch binpkg-docompress strict usersync multilib-strict ebuild-locks binpkg-dostrip preserve-libs"
|
|
Since this is cloveros:
>-march=x86-64 -mtune=generic
>-Ofast -fgraphite-identity -funroll-loops
Why have all these optimizations and use a generic march and mtune?
Also, does -funroll-loops actually provide any benefit?
Any reason to not use -march=native? _________________ My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev" |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3772 Location: Rasi, Finland
|
Posted: Thu Aug 17, 2023 8:36 pm Post subject: Re: emerge -1 clang takes 14 hours? [SOLVED!!] |
|
|
stefan11111 wrote: | Any reason to not use -march=native? | distcc fails to work with -march=native. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
stefan11111 l33t
Joined: 29 Jan 2023 Posts: 935 Location: Romania
|
Posted: Thu Aug 17, 2023 8:40 pm Post subject: Re: emerge -1 clang takes 14 hours? [SOLVED!!] |
|
|
Zucca wrote: | stefan11111 wrote: | Any reason to not use -march=native? | distcc fails to work with -march=native. |
Didn't know that.
But is there any reason to do that when building locally?
I do have this on my raspi, but that's because the stage3 came with it:
Code: | COMMON_FLAGS="-O2 -pipe -march=armv6j -mfpu=vfp -mfloat-abi=hard" |
_________________ My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev" |
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3772 Location: Rasi, Finland
|
Posted: Sun Aug 20, 2023 7:45 am Post subject: Re: emerge -1 clang takes 14 hours? [SOLVED!!] |
|
|
stefan11111 wrote: | But is there any reason to do that when building locally? | Using -march=native is ok when building locally. Using -march=<your cpu arch> should yield to same results and works with distcc. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
sam_ Developer
Joined: 14 Aug 2020 Posts: 2027
|
Posted: Mon Aug 21, 2023 11:20 am Post subject: |
|
|
As "cool" as this derivative might be, they're using several flags known to both cause substantial slowdowns in compile-time and also have dubious benefits on runtime performance. -fipa-pta being an example of that where the GCC developers specifically say not to bother using it unless a measured significant improvement in runtime is measured. |
|
Back to top |
|
|
jesnow l33t
Joined: 26 Apr 2006 Posts: 885
|
Posted: Thu Dec 19, 2024 8:11 pm Post subject: |
|
|
Thanks, yes: My trying out (now-defunct) Clover on my (then) new machine was really the gift that kept on giving. Since starting this thread I got an AMD 5950X to go with all that memory, toned down the "optimizations" and all compiles smoothly. But in the meantime, we have slotted clang, and I have to compile it three times. And it's gotten bigger. I will start another thread about that. I have three clangs and four pythons, and it's supposed to be a "stable" system, with only a small number of active keywords.
Ironic.
sam_ wrote: | As "cool" as this derivative might be, they're using several flags known to both cause substantial slowdowns in compile-time and also have dubious benefits on runtime performance. -fipa-pta being an example of that where the GCC developers specifically say not to bother using it unless a measured significant improvement in runtime is measured. |
|
|
Back to top |
|
|
Zucca Moderator
Joined: 14 Jun 2007 Posts: 3772 Location: Rasi, Finland
|
Posted: Thu Dec 19, 2024 8:36 pm Post subject: |
|
|
sam_ wrote: | several flags known to both cause substantial slowdowns in compile-time and also have dubious benefits on runtime performance. -fipa-pta being an example of that where the GCC developers specifically say not to bother using it unless a measured significant improvement in runtime is measured. | Yeah.
Sure enough, after reading https://stackoverflow.com/questions/13066663/what-in-short-words-does-the-gcc-option-fipa-pta-do I understand having it enabled could really lengthen the compilation process. _________________ ..: Zucca :..
My gentoo installs: | init=/sbin/openrc-init
-systemd -logind -elogind seatd |
Quote: | I am NaN! I am a man! |
|
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|