View previous topic :: View next topic |
Author |
Message |
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Sat Feb 25, 2023 7:22 am Post subject: sys-devel/gcc-12[+lto] no longer feasible in 8GB it seems? |
|
|
Well, I was building gcc-12 [+lto] on my 8GiB quad core machine with -j5 and found that it went 12GB into swap ... poor machine was thrashing bad but remained responsive to ssh for the most part, could have killed it, but I let it go... 11.5 hours to finish.
gcc-11 was able to finish in a bit over 7 hours, but the swapping it endured while building gcc-12 was not healthy when load average shot up to 120 at a point.
I really wonder if I should "downgrade" my cpu so I can have more RAM in this machine, have a spare board with a lethargic atom quad core cpu, but 16GB RAM looks mighty tempting to maintain lto... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54759 Location: 56N 3W
|
Posted: Sat Feb 25, 2023 11:28 am Post subject: |
|
|
eccerr0r,
gcc-12 [+lto] only makes gcc itself with lto.
Does the extra build time of gcc justify the faster build times for other packages using that gcc?
Here, the build time goes from under an hour to almost three hours due to +lto. That's all in RAM too.
It builds unattended, or at least, keeps out of my way, so I'm not inclined to do anything about it. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
CaptainBlood Advocate
Joined: 24 Jan 2010 Posts: 3999
|
Posted: Sat Feb 25, 2023 12:13 pm Post subject: |
|
|
On skylake 3.7 turbo, no smt, MAKEOPTS=-j1, 4 Mb RAM, PORTAGE_TMPDIR="/var/no-tmpfs" Code: | Mon Jan 23 22:13:58 2023 >>> sys-devel/gcc-12.2.1_p20230121-r1
merge time: 9 hours, 48 minutes and 28 seconds. |
NeddySeagoon wrote: | eccerr0r,
gcc-12 [+lto] only makes gcc itself with lto. | +1, took me years 2 realize it.
Thks 4 ur attention, interest & support. _________________ USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. " |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Sat Feb 25, 2023 3:19 pm Post subject: |
|
|
Yeah if i decide to keep using +lto on gcc I'd probably need to drop down to -j2 or something, not sure if it's worth the experiment, and kind of annoying that I'd have to waste two cores for a large portion of the build time - it's just that link phase that it forks off 100 lto1-* processes running at the same time.
The machine is a core2 quad 2.8GHz. I have an atom 2.4GHz board with 16GB RAM collecting dust, wonder if I should downgrade the CPU just to get more RAM, but the 2.4GHz Atom is like half the speed of the core2 quad... alas with double the RAM it won't swapstorm on gcc12 like it did (swap is on mechanical hard drives).
I have to admit that despite causing a lot of problems with the load average at 120, it was still very responsive and I could login to kill it. Thank goodness for anonymous swap! (I had 8GB RAM and 16GB swap on RAID5).
BTW this core2 quad is my main compute farm always-on server that serves distccd and it would be nice its gcc is lto optimized, alas I may have to discontinue it. I think even without lto, the atom would still be slower with lto.
What would be better is if the makefiles for gcc would automatically get rid of -j when it's actually doing linking if doing LTO... that would be the best solution! _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1433 Location: Richmond Hill, Canada
|
Posted: Sat Feb 25, 2023 3:42 pm Post subject: |
|
|
eccerr0r,
Just a suggestion.
Since your atom board is collecting dust and you are thinking utilising it. would you able to put in online in your home network as second machine? If that is possible one way to utilise its RAM is use Network Block Device (NBD) and ZRAM.
You can create ZRAM device on the atom node and use NBD to share it to your primary node. and either use it as your /var/tmp/portage or use it as SWAP.
I have this setup and use the ZRAM/NBD device as /var/tmp/portage to reduce wear and tear for my SSD.
I am not 100% sure it help in term of performance but I feel in general my build time did ram faster. My environment primary is RPI 4B with 4GB ram. and my home network is wired at 1Gb. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Sat Feb 25, 2023 3:48 pm Post subject: |
|
|
BTW the core2 quad finished rust-1.66 in 1h 45m ... gcc [+lto] is way worse than rust!
Yeah I was thinking during that swap storm time to set up nbd swap, not on the atom but temporarily on my i7 which has even more RAM.
The main reason is that right now that core2 quad is probably the last remaining machine that has less than 10GB RAM. My PVR has 12GiB but only dual core (2C4T). I was thinking about using the dual core as my server (and the atom as the PVR) but the atom's onboard video is borked, which fits well with the server that I don't need video too much.
Another benefit of the atom... the 16GB is ECC protected... like how real servers should be. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
ukky Tux's lil' helper
Joined: 26 Feb 2023 Posts: 109 Location: Montreal, Canada
|
Posted: Sun Feb 26, 2023 9:46 pm Post subject: |
|
|
eccerr0r,
A couple of months back I did a lot of tests compiling heavy packages when I tried to figure out how much RAM the system needs.
Also I was logging how much time each package took to compile, what was peak RAM usage and whether swap memory was used.
Variations were: single or dual CPU, number of memory channels, amount of memory.
Packages used in test: sys-devel/clang, www-client/firefox, app-office/libreoffice, and www-client/chromium.
The biggest challenge was to compile www-client/chromium.
With 1.33GiB per CPU thread system would hang when compiling Chromium. Other packages did compile.
With 1.67GiB and 2GiB per CPU thread system did compile Chromium, but there was swapping involved.
With 2.33GiB per CPU thread system did compile Chromium without using swap memory.
In case quad-core CPU and 8 GiB of RAM you might use less threads for make, i.e. MAKEOPTS="-j4" or less. Disabling hyperthreading is also an option. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Mon Feb 27, 2023 12:04 am Post subject: |
|
|
In case you or anyone was wondering about my core2quad (4C4T) machine, clang-15.0.7-r1 built in a bit over 1 hour with -j12 -l6 or something like that. Note that clang (and llvm) will build over distcc so these are not nearly as much of a problem as gcc[+lto] so I do utilize distcc.
Though I don't use chromium or chrome, I do use qtwebengine. I've found while observing build that many files takes 2 to 3GB per thread and this is very similar to the observations of the chromium build. However once again, qtwebengine can be distributed with distcc, and I do distribute it when I can. And yes I've killed helper machines by submitting too many qtwebengine distcc jobs due to the immense amount of memory it eats, enough so that I have to manually edit my /etc/distcc/hosts when it's building qtwebengine to make sure it doesn't overflow certain machines, including the core2quad with 8GB.
I'm just saddened about having a 2C4T i3 machine with 12G and that 4C4T Atom with 16G, just wished the core2quad had more RAM as it's the main compute farm machine (the i3 is my pvr and it better not be skipping when recording, the atom due to its speed is a dust collector at the moment). _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Mon Feb 27, 2023 9:51 pm Post subject: |
|
|
So... I've been trying to update my VPS and gcc-12 fails to compile with 1GB RAM + 3GB SWAP, running single-threaded.
Code: | Out of memory: Killed process 351 (cc1plus) total-vm:1812092kB, anon-rss:266540kB, file-rss:0kB, shmem-rss:0kB, UID:250 pgtables:3480kB oom_score_adj:0 |
Wow. Didn't expect that. Ended up masking the slot.
It's a cheap VPS, and those come with limited disk space, I can't really afford to give it much more. I wonder if some future version will do better.
Should I consider moving to another toolchain? Hell, is it even possible to move to another toolchain? |
|
Back to top |
|
|
ukky Tux's lil' helper
Joined: 26 Feb 2023 Posts: 109 Location: Montreal, Canada
|
Posted: Mon Feb 27, 2023 10:14 pm Post subject: |
|
|
szatox,
One of the options would be to use crossdev on a more powerful system.
You can find details here: Embedded Handbook/General/Cross-compiling with Portage
Used this a few years back on Embedded Atom CPU with 2GiB of RAM.
I believe I just used scp to copy files from SYSROOT to a target hard drive. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Mon Feb 27, 2023 10:20 pm Post subject: |
|
|
szatox, is that 32- or 64-bit?
I'd be surprised you could do much of anything with 1GB RAM on 64-bit. 1GB RAM on 32-bit and no LTO should be able to complete the task however. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
stefan11111 l33t
Joined: 29 Jan 2023 Posts: 949 Location: Romania
|
Posted: Mon Feb 27, 2023 11:07 pm Post subject: |
|
|
eccerr0r,
Building gcc-12.2.1_p20230121-r1 with USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp" takes 3 hours and 18 minutes to compile with -j4 on a i5 7400. It should not take you so long to compile, unless you made a mistake or really need all of gcc. _________________ My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev" |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Mon Feb 27, 2023 11:53 pm Post subject: |
|
|
Code: | [ebuild NS ] sys-devel/gcc-12.2.1_p20230121-r1:12::gentoo [11.3.1_p20221209:11::gentoo] USE="(cxx) fortran nls nptl openmp (pie) sanitize ssp -ada (-cet) (-custom-cflags) -d -debug -default-stack-clash-protection% -default-znow% -doc (-fixed-point) -go -graphite -hardened (-ieee-long-double) -jit (-libssp) -lto (-multilib) -objc -objc++ -objc-gc (-pch) -pgo -systemtap -test -valgrind -vanilla -vtv -zstd" 0 KiB
|
Quote: | # qlop -c gcc
sys-devel/gcc: 4:01:01 average for 19 merges
sys-devel/gcc: 4s average for 7 unmerges
|
Single virtual core at ~1,7GHz. Not too bad.
Running amd64/nomultilib. Yes, it probably is suboptimal, but this machine mostly serves as an internet point of presence, so there was no point in sweating the small stuff.
Until now I've been able to build gcc by adding temporary swap for a total of 1.8G (on top of said 1GB RAM). I have a small slice of disk reserved for snapshots I make as a part of backup solution, which can be easily repurposed to temporary swap partition. This is what LVM was made for
Quote: | One of the options would be to use crossdev on a more powerful system. | even easier, I could build it in chroot. But I'll only cross that bridge if I run out of other options. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Tue Feb 28, 2023 12:43 am Post subject: |
|
|
stefan11111 wrote: | Building gcc-12.2.1_p20230121-r1 with USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp" takes 3 hours and 18 minutes to compile with -j4 on a i5 7400. It should not take you so long to compile, unless you made a mistake or really need all of gcc. |
How much RAM do you have? _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
stefan11111 l33t
Joined: 29 Jan 2023 Posts: 949 Location: Romania
|
Posted: Tue Feb 28, 2023 9:16 am Post subject: |
|
|
eccerr0r wrote: | stefan11111 wrote: | Building gcc-12.2.1_p20230121-r1 with USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp" takes 3 hours and 18 minutes to compile with -j4 on a i5 7400. It should not take you so long to compile, unless you made a mistake or really need all of gcc. |
How much RAM do you have? |
I have 8GB ram _________________ My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev" |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Tue Feb 28, 2023 5:05 pm Post subject: |
|
|
There is a disconnect here, what do you mean of "all of gcc"? (and yes I do have +fortran for sci-libs/lapack....) and you should have seen that lto process storm even with -j4 ...
Next time I will try -j4 to see if it's any better but not going to bet the farm on it... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
stefan11111 l33t
Joined: 29 Jan 2023 Posts: 949 Location: Romania
|
Posted: Tue Feb 28, 2023 5:51 pm Post subject: |
|
|
eccerr0r wrote: | There is a disconnect here, what do you mean of "all of gcc"? (and yes I do have +fortran for sci-libs/lapack....) and you should have seen that lto process storm even with -j4 ...
Next time I will try -j4 to see if it's any better but not going to bet the farm on it... |
By all of gcc, I mean all the use flags enabled(like fortran). There may be something seriously wrong, as your emerge time is 3.5 times mine.
Code: | $ emerge -pqv gcc
[ebuild R ] sys-devel/gcc-12.2.1_p20230121-r1 USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp -ada (-cet) (-custom-cflags) -d -debug -default-stack-clash-protection -default-znow -doc (-fixed-point) -fortran -go -graphite -hardened (-ieee-long-double) -jit (-libssp) -nls -objc -objc++ -objc-gc (-pch) -systemtap -test -valgrind -vanilla -vtv -zstd" |
_________________ My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev" |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Tue Feb 28, 2023 6:25 pm Post subject: |
|
|
I've noticed my core2 quads being similar or even bested by my i3-4160 depending on the software, so you have a huge CPU and RAM speed bonus on your i5-7400.
Without lto I do notice that fortran takes at least half hour to an hour or so to build, then I suspect that fortran also needs to go through lto... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
sam_ Developer
Joined: 14 Aug 2020 Posts: 2100
|
Posted: Sun Mar 05, 2023 11:50 am Post subject: |
|
|
eccerr0r wrote: | Yeah if i decide to keep using +lto on gcc I'd probably need to drop down to -j2 or something, not sure if it's worth the experiment, and kind of annoying that I'd have to waste two cores for a large portion of the build time - it's just that link phase that it forks off 100 lto1-* processes running at the same time.
The machine is a core2 quad 2.8GHz. I have an atom 2.4GHz board with 16GB RAM collecting dust, wonder if I should downgrade the CPU just to get more RAM, but the 2.4GHz Atom is like half the speed of the core2 quad... alas with double the RAM it won't swapstorm on gcc12 like it did (swap is on mechanical hard drives).
I have to admit that despite causing a lot of problems with the load average at 120, it was still very responsive and I could login to kill it. Thank goodness for anonymous swap! (I had 8GB RAM and 16GB swap on RAID5).
BTW this core2 quad is my main compute farm always-on server that serves distccd and it would be nice its gcc is lto optimized, alas I may have to discontinue it. I think even without lto, the atom would still be slower with lto.
What would be better is if the makefiles for gcc would automatically get rid of -j when it's actually doing linking if doing LTO... that would be the best solution! |
Newer make (>= 4.4) may handle this correctly. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Wed Mar 08, 2023 2:40 am Post subject: |
|
|
Hmm...thought it always did that (in previous to make 4.3) because it could be competing with other processes, instead it seems like the heuristic is a worse predictor.
Still don't think it totally solves the issue, will have to experiment ... next update cycle... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
tld Veteran
Joined: 09 Dec 2003 Posts: 1850
|
Posted: Tue Mar 21, 2023 1:53 pm Post subject: |
|
|
eccerr0r wrote: | szatox, is that 32- or 64-bit?
I'd be surprised you could do much of anything with 1GB RAM on 64-bit. 1GB RAM on 32-bit and no LTO should be able to complete the task however. | My MythTV systems are both x86 and I managed to compile gcc-12.2.1_p20230121-r1 on both yesterday (with no LTO).
The frontend has 2 GB RAM and 1 GB swap and took about 12 hours. The backend only has 1.5 GB RAM and had only 1 GB of swap. At one point that came to a near standstill and was almost impossible to shell to because it was about out of RAM and swap, but I was able to add a 2 GB swap file while it was running, and it completed in about 13 hours.
EDIT: Just to note, that frontend and backend are 2.8 GHz and 2.66 GHz Dell P4s...so essentially old enough to drive...haha.
Tom |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9875 Location: almost Mile High in the USA
|
Posted: Tue Mar 21, 2023 4:24 pm Post subject: |
|
|
Yeah, 2GB RAM or even 1GB RAM is okay for 32-bit x86, without LTO. I think my Atom 1.6GHz 32-bit with 2GB RAM was able to build gcc just fine (-lto) in 12 hours as well. Alas this machine isn't frequently used and it's okay for it to be out of service whether swapping or just compiling. I probably need to build gcc-12 on my Pentium-M 1.6GHz (1.5GB RAM) and expect it to take about 10 hours (-lto). I don't recall having memory pressure during gcc build however, it was purely gcc computational. Both these x86 machines I run with -j2 for gcc for the most part.
Incidentally I'm beginning to find my Pentium-M firefox performance is quite anemic, just the gui with no webpages loaded is pretty bad now... not sure if it's solely due to firefox bloat but also mesa-amber issues as well most likely.
Anyway I'd just expect my core2quad with 8GB RAM to build with +lto and use all four cores during build. Last successful +lto build without killing itself was 5 hours IIRC. As long as it doesn't swap storm it's good but I'd want to build as fast as possible. Normally my core2quads can finish gcc (-lto) in about 2 hours but this machine, having a fast gcc would be nice as it's my main distccd server ... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
|