Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
sys-devel/gcc-12[+lto] no longer feasible in 8GB it seems?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Sat Feb 25, 2023 7:22 am    Post subject: sys-devel/gcc-12[+lto] no longer feasible in 8GB it seems? Reply with quote

Well, I was building gcc-12 [+lto] on my 8GiB quad core machine with -j5 and found that it went 12GB into swap ... poor machine was thrashing bad but remained responsive to ssh for the most part, could have killed it, but I let it go... 11.5 hours to finish.

gcc-11 was able to finish in a bit over 7 hours, but the swapping it endured while building gcc-12 was not healthy when load average shot up to 120 at a point.

I really wonder if I should "downgrade" my cpu so I can have more RAM in this machine, have a spare board with a lethargic atom quad core cpu, but 16GB RAM looks mighty tempting to maintain lto...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54744
Location: 56N 3W

PostPosted: Sat Feb 25, 2023 11:28 am    Post subject: Reply with quote

eccerr0r,

gcc-12 [+lto] only makes gcc itself with lto.
Does the extra build time of gcc justify the faster build times for other packages using that gcc?

Here, the build time goes from under an hour to almost three hours due to +lto. That's all in RAM too.
It builds unattended, or at least, keeps out of my way, so I'm not inclined to do anything about it.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
CaptainBlood
Advocate
Advocate


Joined: 24 Jan 2010
Posts: 3999

PostPosted: Sat Feb 25, 2023 12:13 pm    Post subject: Reply with quote

On skylake 3.7 turbo, no smt, MAKEOPTS=-j1, 4 Mb RAM, PORTAGE_TMPDIR="/var/no-tmpfs"
Code:
    Mon Jan 23 22:13:58 2023 >>> sys-devel/gcc-12.2.1_p20230121-r1
       merge time: 9 hours, 48 minutes and 28 seconds.
NeddySeagoon wrote:
eccerr0r,
gcc-12 [+lto] only makes gcc itself with lto.
+1, took me years 2 realize it.

Thks 4 ur attention, interest & support.
_________________
USE="-* ..." in /etc/portage/make.conf here, i.e. a countermeasure to portage implicit braces, belt & diaper paradigm
LT: "I've been doing a passable imitation of the Fontana di Trevi, except my medium is mucus. Sooo much mucus. "
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Sat Feb 25, 2023 3:19 pm    Post subject: Reply with quote

Yeah if i decide to keep using +lto on gcc I'd probably need to drop down to -j2 or something, not sure if it's worth the experiment, and kind of annoying that I'd have to waste two cores for a large portion of the build time - it's just that link phase that it forks off 100 lto1-* processes running at the same time.

The machine is a core2 quad 2.8GHz. I have an atom 2.4GHz board with 16GB RAM collecting dust, wonder if I should downgrade the CPU just to get more RAM, but the 2.4GHz Atom is like half the speed of the core2 quad... alas with double the RAM it won't swapstorm on gcc12 like it did (swap is on mechanical hard drives).

I have to admit that despite causing a lot of problems with the load average at 120, it was still very responsive and I could login to kill it. Thank goodness for anonymous swap! (I had 8GB RAM and 16GB swap on RAID5).

BTW this core2 quad is my main compute farm always-on server that serves distccd and it would be nice its gcc is lto optimized, alas I may have to discontinue it. I think even without lto, the atom would still be slower with lto.

What would be better is if the makefiles for gcc would automatically get rid of -j when it's actually doing linking if doing LTO... that would be the best solution!
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1423
Location: Richmond Hill, Canada

PostPosted: Sat Feb 25, 2023 3:42 pm    Post subject: Reply with quote

eccerr0r,

Just a suggestion.

Since your atom board is collecting dust and you are thinking utilising it. would you able to put in online in your home network as second machine? If that is possible one way to utilise its RAM is use Network Block Device (NBD) and ZRAM.

You can create ZRAM device on the atom node and use NBD to share it to your primary node. and either use it as your /var/tmp/portage or use it as SWAP.

I have this setup and use the ZRAM/NBD device as /var/tmp/portage to reduce wear and tear for my SSD.

I am not 100% sure it help in term of performance but I feel in general my build time did ram faster. My environment primary is RPI 4B with 4GB ram. and my home network is wired at 1Gb.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Sat Feb 25, 2023 3:48 pm    Post subject: Reply with quote

BTW the core2 quad finished rust-1.66 in 1h 45m ... gcc [+lto] is way worse than rust!

Yeah I was thinking during that swap storm time to set up nbd swap, not on the atom but temporarily on my i7 which has even more RAM.

The main reason is that right now that core2 quad is probably the last remaining machine that has less than 10GB RAM. My PVR has 12GiB but only dual core (2C4T). I was thinking about using the dual core as my server (and the atom as the PVR) but the atom's onboard video is borked, which fits well with the server that I don't need video too much.

Another benefit of the atom... the 16GB is ECC protected... like how real servers should be.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
ukky
Tux's lil' helper
Tux's lil' helper


Joined: 26 Feb 2023
Posts: 109
Location: Montreal, Canada

PostPosted: Sun Feb 26, 2023 9:46 pm    Post subject: Reply with quote

eccerr0r,

A couple of months back I did a lot of tests compiling heavy packages when I tried to figure out how much RAM the system needs.
Also I was logging how much time each package took to compile, what was peak RAM usage and whether swap memory was used.
Variations were: single or dual CPU, number of memory channels, amount of memory.
Packages used in test: sys-devel/clang, www-client/firefox, app-office/libreoffice, and www-client/chromium.

The biggest challenge was to compile www-client/chromium.

With 1.33GiB per CPU thread system would hang when compiling Chromium. Other packages did compile.
With 1.67GiB and 2GiB per CPU thread system did compile Chromium, but there was swapping involved.
With 2.33GiB per CPU thread system did compile Chromium without using swap memory.

In case quad-core CPU and 8 GiB of RAM you might use less threads for make, i.e. MAKEOPTS="-j4" or less. Disabling hyperthreading is also an option.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Mon Feb 27, 2023 12:04 am    Post subject: Reply with quote

In case you or anyone was wondering about my core2quad (4C4T) machine, clang-15.0.7-r1 built in a bit over 1 hour with -j12 -l6 or something like that. Note that clang (and llvm) will build over distcc so these are not nearly as much of a problem as gcc[+lto] so I do utilize distcc.

Though I don't use chromium or chrome, I do use qtwebengine. I've found while observing build that many files takes 2 to 3GB per thread and this is very similar to the observations of the chromium build. However once again, qtwebengine can be distributed with distcc, and I do distribute it when I can. And yes I've killed helper machines by submitting too many qtwebengine distcc jobs due to the immense amount of memory it eats, enough so that I have to manually edit my /etc/distcc/hosts when it's building qtwebengine to make sure it doesn't overflow certain machines, including the core2quad with 8GB.

I'm just saddened about having a 2C4T i3 machine with 12G and that 4C4T Atom with 16G, just wished the core2quad had more RAM as it's the main compute farm machine (the i3 is my pvr and it better not be skipping when recording, the atom due to its speed is a dust collector at the moment).
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Mon Feb 27, 2023 9:51 pm    Post subject: Reply with quote

So... I've been trying to update my VPS and gcc-12 fails to compile with 1GB RAM + 3GB SWAP, running single-threaded.
Code:
 Out of memory: Killed process 351 (cc1plus) total-vm:1812092kB, anon-rss:266540kB, file-rss:0kB, shmem-rss:0kB, UID:250 pgtables:3480kB oom_score_adj:0

Wow. Didn't expect that. Ended up masking the slot.

It's a cheap VPS, and those come with limited disk space, I can't really afford to give it much more. I wonder if some future version will do better.
Should I consider moving to another toolchain? Hell, is it even possible to move to another toolchain?
Back to top
View user's profile Send private message
ukky
Tux's lil' helper
Tux's lil' helper


Joined: 26 Feb 2023
Posts: 109
Location: Montreal, Canada

PostPosted: Mon Feb 27, 2023 10:14 pm    Post subject: Reply with quote

szatox,

One of the options would be to use crossdev on a more powerful system.
You can find details here: Embedded Handbook/General/Cross-compiling with Portage
Used this a few years back on Embedded Atom CPU with 2GiB of RAM.
I believe I just used scp to copy files from SYSROOT to a target hard drive.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Mon Feb 27, 2023 10:20 pm    Post subject: Reply with quote

szatox, is that 32- or 64-bit?
I'd be surprised you could do much of anything with 1GB RAM on 64-bit. 1GB RAM on 32-bit and no LTO should be able to complete the task however.
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
stefan11111
l33t
l33t


Joined: 29 Jan 2023
Posts: 949
Location: Romania

PostPosted: Mon Feb 27, 2023 11:07 pm    Post subject: Reply with quote

eccerr0r,

Building gcc-12.2.1_p20230121-r1 with USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp" takes 3 hours and 18 minutes to compile with -j4 on a i5 7400. It should not take you so long to compile, unless you made a mistake or really need all of gcc.
_________________
My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev"
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Mon Feb 27, 2023 11:53 pm    Post subject: Reply with quote

Code:
[ebuild  NS    ] sys-devel/gcc-12.2.1_p20230121-r1:12::gentoo [11.3.1_p20221209:11::gentoo] USE="(cxx) fortran nls nptl openmp (pie) sanitize ssp -ada (-cet) (-custom-cflags) -d -debug -default-stack-clash-protection% -default-znow% -doc (-fixed-point) -go -graphite -hardened (-ieee-long-double) -jit (-libssp) -lto (-multilib) -objc -objc++ -objc-gc (-pch) -pgo -systemtap -test -valgrind -vanilla -vtv -zstd" 0 KiB

Quote:
# qlop -c gcc
sys-devel/gcc: 4:01:01 average for 19 merges
sys-devel/gcc: 4s average for 7 unmerges

Single virtual core at ~1,7GHz. Not too bad.

Running amd64/nomultilib. Yes, it probably is suboptimal, but this machine mostly serves as an internet point of presence, so there was no point in sweating the small stuff.
Until now I've been able to build gcc by adding temporary swap for a total of 1.8G (on top of said 1GB RAM). I have a small slice of disk reserved for snapshots I make as a part of backup solution, which can be easily repurposed to temporary swap partition. This is what LVM was made for :lol:


Quote:
One of the options would be to use crossdev on a more powerful system.
even easier, I could build it in chroot. But I'll only cross that bridge if I run out of other options.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Tue Feb 28, 2023 12:43 am    Post subject: Reply with quote

stefan11111 wrote:
Building gcc-12.2.1_p20230121-r1 with USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp" takes 3 hours and 18 minutes to compile with -j4 on a i5 7400. It should not take you so long to compile, unless you made a mistake or really need all of gcc.

How much RAM do you have?
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
stefan11111
l33t
l33t


Joined: 29 Jan 2023
Posts: 949
Location: Romania

PostPosted: Tue Feb 28, 2023 9:16 am    Post subject: Reply with quote

eccerr0r wrote:
stefan11111 wrote:
Building gcc-12.2.1_p20230121-r1 with USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp" takes 3 hours and 18 minutes to compile with -j4 on a i5 7400. It should not take you so long to compile, unless you made a mistake or really need all of gcc.

How much RAM do you have?

I have 8GB ram
_________________
My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev"
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Tue Feb 28, 2023 5:05 pm    Post subject: Reply with quote

There is a disconnect here, what do you mean of "all of gcc"? (and yes I do have +fortran for sci-libs/lapack....) and you should have seen that lto process storm even with -j4 ...
Next time I will try -j4 to see if it's any better but not going to bet the farm on it...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
stefan11111
l33t
l33t


Joined: 29 Jan 2023
Posts: 949
Location: Romania

PostPosted: Tue Feb 28, 2023 5:51 pm    Post subject: Reply with quote

eccerr0r wrote:
There is a disconnect here, what do you mean of "all of gcc"? (and yes I do have +fortran for sci-libs/lapack....) and you should have seen that lto process storm even with -j4 ...
Next time I will try -j4 to see if it's any better but not going to bet the farm on it...

By all of gcc, I mean all the use flags enabled(like fortran). There may be something seriously wrong, as your emerge time is 3.5 times mine.
Code:
$ emerge -pqv gcc
[ebuild   R   ] sys-devel/gcc-12.2.1_p20230121-r1  USE="(cxx) lto (multilib) nptl openmp pgo (pie) sanitize ssp -ada (-cet) (-custom-cflags) -d -debug -default-stack-clash-protection -default-znow -doc (-fixed-point) -fortran -go -graphite -hardened (-ieee-long-double) -jit (-libssp) -nls -objc -objc++ -objc-gc (-pch) -systemtap -test -valgrind -vanilla -vtv -zstd"

_________________
My overlay: https://github.com/stefan11111/stefan_overlay
INSTALL_MASK="/etc/systemd /lib/systemd /usr/lib/systemd /usr/lib/modules-load.d *udev* /usr/lib/tmpfiles.d *tmpfiles* /var/lib/dbus /usr/bin/gdbus /lib/udev"
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Tue Feb 28, 2023 6:25 pm    Post subject: Reply with quote

I've noticed my core2 quads being similar or even bested by my i3-4160 depending on the software, so you have a huge CPU and RAM speed bonus on your i5-7400.

Without lto I do notice that fortran takes at least half hour to an hour or so to build, then I suspect that fortran also needs to go through lto...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
sam_
Developer
Developer


Joined: 14 Aug 2020
Posts: 2098

PostPosted: Sun Mar 05, 2023 11:50 am    Post subject: Reply with quote

eccerr0r wrote:
Yeah if i decide to keep using +lto on gcc I'd probably need to drop down to -j2 or something, not sure if it's worth the experiment, and kind of annoying that I'd have to waste two cores for a large portion of the build time - it's just that link phase that it forks off 100 lto1-* processes running at the same time.

The machine is a core2 quad 2.8GHz. I have an atom 2.4GHz board with 16GB RAM collecting dust, wonder if I should downgrade the CPU just to get more RAM, but the 2.4GHz Atom is like half the speed of the core2 quad... alas with double the RAM it won't swapstorm on gcc12 like it did (swap is on mechanical hard drives).

I have to admit that despite causing a lot of problems with the load average at 120, it was still very responsive and I could login to kill it. Thank goodness for anonymous swap! (I had 8GB RAM and 16GB swap on RAID5).

BTW this core2 quad is my main compute farm always-on server that serves distccd and it would be nice its gcc is lto optimized, alas I may have to discontinue it. I think even without lto, the atom would still be slower with lto.

What would be better is if the makefiles for gcc would automatically get rid of -j when it's actually doing linking if doing LTO... that would be the best solution!


Newer make (>= 4.4) may handle this correctly.
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Wed Mar 08, 2023 2:40 am    Post subject: Reply with quote

Hmm...thought it always did that (in previous to make 4.3) because it could be competing with other processes, instead it seems like the heuristic is a worse predictor.
Still don't think it totally solves the issue, will have to experiment ... next update cycle...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
tld
Veteran
Veteran


Joined: 09 Dec 2003
Posts: 1850

PostPosted: Tue Mar 21, 2023 1:53 pm    Post subject: Reply with quote

eccerr0r wrote:
szatox, is that 32- or 64-bit?
I'd be surprised you could do much of anything with 1GB RAM on 64-bit. 1GB RAM on 32-bit and no LTO should be able to complete the task however.
My MythTV systems are both x86 and I managed to compile gcc-12.2.1_p20230121-r1 on both yesterday (with no LTO).

The frontend has 2 GB RAM and 1 GB swap and took about 12 hours. The backend only has 1.5 GB RAM and had only 1 GB of swap. At one point that came to a near standstill and was almost impossible to shell to because it was about out of RAM and swap, but I was able to add a 2 GB swap file while it was running, and it completed in about 13 hours.

EDIT: Just to note, that frontend and backend are 2.8 GHz and 2.66 GHz Dell P4s...so essentially old enough to drive...haha.

Tom
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9874
Location: almost Mile High in the USA

PostPosted: Tue Mar 21, 2023 4:24 pm    Post subject: Reply with quote

Yeah, 2GB RAM or even 1GB RAM is okay for 32-bit x86, without LTO. I think my Atom 1.6GHz 32-bit with 2GB RAM was able to build gcc just fine (-lto) in 12 hours as well. Alas this machine isn't frequently used and it's okay for it to be out of service whether swapping or just compiling. I probably need to build gcc-12 on my Pentium-M 1.6GHz (1.5GB RAM) and expect it to take about 10 hours (-lto). I don't recall having memory pressure during gcc build however, it was purely gcc computational. Both these x86 machines I run with -j2 for gcc for the most part.

Incidentally I'm beginning to find my Pentium-M firefox performance is quite anemic, just the gui with no webpages loaded is pretty bad now... not sure if it's solely due to firefox bloat but also mesa-amber issues as well most likely.

Anyway I'd just expect my core2quad with 8GB RAM to build with +lto and use all four cores during build. Last successful +lto build without killing itself was 5 hours IIRC. As long as it doesn't swap storm it's good but I'd want to build as fast as possible. Normally my core2quads can finish gcc (-lto) in about 2 hours but this machine, having a fast gcc would be nice as it's my main distccd server ...
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum