View previous topic :: View next topic |
Author |
Message |
therik n00b
Joined: 14 Jul 2024 Posts: 7
|
Posted: Sun Jul 14, 2024 8:26 pm Post subject: Is it possible to max out modern hardware with portage? |
|
|
I wonder if I'm missing some settings, but I started the Code: | emerge --ask --emptytree @world | as part of the switch to 23.0 profiles about 6 hours ago, about 1100 packages to compile.
After two hours of barely getting anywhere, I interrupted the run and edited make.conf:
Code: | FEATURES="parallel-fetch parallel-install -ebuild-locks -merge-wait"
MAKEOPTS="-j31 -l34"
EMERGE_DEFAULT_OPTS="--jobs=30 --load-average=31" |
Specifically, I disabled ebuild-locks and merge-wait, the rest was the same.
Still, my 1 min load averages rarely got over 10 for the first few hours, most of the time hovering around 3-4, with overall CPU usage around 10-30%, ~10GiB of mem used and occasional blips on drive IO.
What is holding it up? Am I having some wrong settings somewhere? Is it just a line of sequential operations that can't be parallelized?
If so, what's the purpose of distcc, if we can't really parallelize even on a single system?
I then tried mounting /var/tmp/portage as a tmpfs and while the compilation did pick up a bit, it looks more like a coincidence. All the parallelism is still coming from makeopts and gcc, not from portage running many emerges at once.
I guess what I don't understand is that there are many warnings online about how makeopts -jX and --jobs can multiply each other, but I see emerge very rarely run more than 1 job at a time.
Typically, this is what I see:
Code: | .....
>>> Installing (191 of 470) net-libs/libproxy-0.5.5::gentoo
>>> Completed (191 of 470) net-libs/libproxy-0.5.5::gentoo
>>> Emerging (192 of 470) app-emacs/po-mode-0.22::gentoo
>>> Installing (192 of 470) app-emacs/po-mode-0.22::gentoo
>>> Completed (192 of 470) app-emacs/po-mode-0.22::gentoo
>>> Emerging (193 of 470) sys-devel/llvm-common-17.0.6::gentoo
>>> Jobs: 192 of 470 complete, 1 running Load avg: 5.3, 10.1, 14.8
|
Code: |
Gentoo /var/tmp/portage # df -h
Filesystem Size Used Avail Use% Mounted on
....
tmpfs 32G 1.5G 30G 5% /var/tmp/portage
|
Code: |
Gentoo /var/tmp/portage # free -h
total used free shared buff/cache available
Mem: 62Gi 12Gi 25Gi 2.3Gi 28Gi 50Gi
Swap: 15Gi 15Mi 15Gi
|
|
|
Back to top |
|
|
pietinger Moderator
Joined: 17 Oct 2006 Posts: 5104 Location: Bavaria
|
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Sun Jul 14, 2024 9:50 pm Post subject: |
|
|
Yeah there is something called Amdahl's Law and portage is stuck with a lot of unparallelizeable code. There is also a lot of serialization in portage too which doesn't help.
Distcc really is only helpful on big projects like qtwebengine or chrome, else it actually somewhat is a burden. A lot of small ebuilds are kind of hindered by distcc.
I just made my E5-2690v2 build a kernel. 10 mins for kernel+modules. Not really that fast but it's not been that fast for me since the 2m 30sec kernel build times with my dual celeron-450 way back when (without modules so not apples vs apples).
distcc has been somewhat helpful however. Been trying to load down my E5-2690v2 and it has been taking a lot of distcc load and it couldn't be faster having the other machines compiling on their own. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
dimko Apprentice
Joined: 12 Feb 2006 Posts: 201
|
Posted: Sun Jul 14, 2024 10:59 pm Post subject: perhaps issue is nbt with portage? |
|
|
I am not a programmer, but if I understand correctly, when you compile some programs, they require some libs to exist on the system so some functionality of sdaid libs can be checked.
Which means there can be a situation where bunch of packets are 'waiting' as per requirement of package, for that lib to be compiled. Now multiply by several such libs and you get misery.
And I suspect its not so easily mathematically calculated, as its 'traveling salesman' problem. _________________ Just a user. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Mon Jul 15, 2024 5:20 am Post subject: |
|
|
There are two things here:
- as dimko has noted, there might not be enough jobs available. The rest may be waiting on a single dependency to be compiled.
- /var/db/pkg cannot be written concurrently.
It took me ~8 hours to emerge ~1600 packages using the binary packages host.
p.s. no, it's not traveling salesperson problem, which is intractable. Also you should be careful what you point at when naming its complexity.
Best Regards,
Georgi |
|
Back to top |
|
|
e8root Tux's lil' helper
Joined: 09 Feb 2024 Posts: 94
|
Posted: Mon Jul 15, 2024 6:18 am Post subject: |
|
|
It should be obvious you cannot parallelize compilation jobs infinitely because a lot of packages depend on other packages.
What you could do (instead of asking "What is holding it up?" questions...) is to double-check if Portage is doing what it is supposed to be doing correctly. I mean obviously this is the kind of issue which in theory is simple but if you want to make it optimal it can get very complex very fast. Maybe Portage is too conservative to the point of not utilizing all resources optimally but maybe that is the only sane way to do it - and what I would assume is the case.
In fact it should be obvious Portage cannot be 100% optimal from simple fact that you cannot beforehand know how given build will load the system. Not to mention each system is different with different bottlenecks and settings. You will for example get much lower average cpu load when using LTO than without, etc. It is not even that obvious it would be that beneficial to run some tasks in parallel if you could - though I guess Portage doesn't care and it will run things in parallel if it can and has enough "jobs" slots.
As for distcc... when you add more computers with local memory, local storage etc. the whole optimization problem becomes much more complex. Especially when those computers have different performance. I never used distcc so I don't know how it actually schedules things but I can imagine throwing slower computers to the mix as some sort of helper resources can be very detrimental to performance if long-running task is executed on these slower machines e.g. LTO linking. Not an issue when all computers in the distcc network have identical or very similar performance but if not there might be serious bottlenecks.
All in all I would assume less than 100% CPU utilization cannot be helped on anything that isn't single core. And even then you would get I/O and network bottlenecks. I don't see any way around this issue.
Also - your settings are wrong.
Sure 2GB per compilation job is the value that is unrealistic but 30 jobs times 31 threads each is in theory 930 running processes - even with fraction of memory usage compared to recommended 2GB it could (if it was possible to run so many packages builds in parallel) overwhelm your computer. In this case it would start to heavily stutter and with just 16GB swap things would start crashing. So... if it doesn't work anyways as you expected it is probably best to reduce --jobs to more reasonable value just to be on the safe side. _________________ Unix Wars - Episode V: AT&T Strikes Back |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20485
|
Posted: Mon Jul 15, 2024 7:11 pm Post subject: |
|
|
logrusx wrote: | It took me ~8 hours to emerge ~1600 packages using the binary packages host. | That seems horribly slow. Did you also compile a lot of packages? _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Mon Jul 15, 2024 7:41 pm Post subject: |
|
|
pjp wrote: | logrusx wrote: | It took me ~8 hours to emerge ~1600 packages using the binary packages host. | That seems horribly slow. Did you also compile a lot of packages? |
I remember one big package in particular but I don't think it took more than an hour. I was surprised by the time it took too. But most of the time no parallel jobs were running and many were waiting on the lock of /var/db/pkg. I didn't notice a big difference between larger and smaller binary packages, so I think it was mostly portage doing the merging stuff.
EDIT: just checked, it was 6:25 hours. Most of the time I used the computer, but it wasn't under load anyway.
Best Regards,
Georgi |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20485
|
Posted: Mon Jul 15, 2024 8:28 pm Post subject: |
|
|
That still seems beyond reasonably slow. Do other binary installs take ~4hrs? If so, a LOT has changed since I last installed one in ~2012. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Mon Jul 15, 2024 9:05 pm Post subject: |
|
|
Well, I remember a thread someone was asking why emerging a virtual took 10 times it even more time than it used to some years before, so I guess a lot has changed.
Tomorrow I'll see if I can derive an average of how long it takes to merge a binary package.
Best Regards,
Georgi |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Mon Jul 15, 2024 11:35 pm Post subject: |
|
|
# time emerge -1 virtual/ssh # (note: real time is higher than timestamps below because dependency computation is not timestamped. Also this is a 32-bit KVM on a Core2 Quad.)
real 0m34.434s
user 0m19.483s
sys 0m7.333s
Mon Jul 15 17:42:23 MDT 2024 virtual/ssh clean
Mon Jul 15 17:42:24 MDT 2024 virtual/ssh setup
Mon Jul 15 17:42:28 MDT 2024 virtual/ssh install
Mon Jul 15 17:42:29 MDT 2024 virtual/ssh
Mon Jul 15 17:42:32 MDT 2024 virtual/ssh instprep
Mon Jul 15 17:42:33 MDT 2024 virtual/ssh
Mon Jul 15 17:42:33 MDT 2024 virtual/ssh preinst
Mon Jul 15 17:42:34 MDT 2024 virtual/ssh
Mon Jul 15 17:42:36 MDT 2024 virtual/ssh prerm
Mon Jul 15 17:42:37 MDT 2024 virtual/ssh postrm
Mon Jul 15 17:42:37 MDT 2024 virtual/ssh cleanrm
Mon Jul 15 17:42:38 MDT 2024 virtual/ssh postinst
Mon Jul 15 17:42:39 MDT 2024 virtual/ssh
Mon Jul 15 17:42:40 MDT 2024 virtual/ssh
Mon Jul 15 17:42:41 MDT 2024 virtual/ssh clean
hmm... no really really bad steps but all take a chunk out of the pie. However if there were 600 packages on the system and all were as fast as virtuals, at 3 packages per minute, it would take over 3 hours to install on this 32 bit KVM on a Core2Quad (64-bit)... which sounds really bad because there's more to it than virtual packages. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Tue Jul 16, 2024 5:55 am Post subject: |
|
|
Ah it was your thread! I lost track of it however.
It takes 10 seconds for a virtual on my system. And if I remember correctly it was taking at leas 15 seconds for a binary package. I don't feel keen on trying to parse emerge.log to extract averages, so I leave it at that. But emerging binary packages is not that fast as someone might expect. Portage is still doing a lot of work. And it can't be done in parallel because of the lock on /var/db/pkg. Maybe there are ways to improve on that but it increases the volume of information that might be lost during an unexpected termination of emerge.
Best Regards,
Georgi |
|
Back to top |
|
|
therik n00b
Joined: 14 Jul 2024 Posts: 7
|
Posted: Tue Jul 16, 2024 12:59 pm Post subject: |
|
|
logrusx wrote: | pjp wrote: | logrusx wrote: | It took me ~8 hours to emerge ~1600 packages using the binary packages host. | That seems horribly slow. Did you also compile a lot of packages? |
I remember one big package in particular but I don't think it took more than an hour. I was surprised by the time it took too. But most of the time no parallel jobs were running and many were waiting on the lock of /var/db/pkg. I didn't notice a big difference between larger and smaller binary packages, so I think it was mostly portage doing the merging stuff.
EDIT: just checked, it was 6:25 hours. Most of the time I used the computer, but it wasn't under load anyway.
Best Regards,
Georgi |
This is pretty much what I saw, lot of waiting, 1 package at a time, most of the system being almost idle, the compilation itself was rarely the bottleneck.
What's different about portage in this regard? Other distros seem to run through it much faster. |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1248 Location: Richmond Hill, Canada
|
Posted: Tue Jul 16, 2024 1:34 pm Post subject: |
|
|
therik wrote: | What's different about portage in this regard? Other distros seem to run through it much faster. |
Because the package manager was written in Python? The package database is just a plain file? Just joking
Seriously, sense so many are joint for conversation, it must mean there is great interest on this topic. Can someone diverse a plan, share some knowledge on how to profile a typical emerge run? I am willing to put in my time for such effort but not sure where to start. |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Tue Jul 16, 2024 1:50 pm Post subject: |
|
|
therik wrote: |
What's different about portage in this regard? Other distros seem to run through it much faster. |
I guess portage was never meant to be fast. I mean, you're waiting for all those compile jobs to finish, what does it matter how fast portage is. Also portage is old. Back in the days of HDD that could have barely been noticed. And last but not least, as pingtoo pointed out, the DB is plain text files. Files do no t support concurrent access, concurrent access poses much more threats to data integrity.
Other distributions run much faster, but how often do you hear of broken updates? At leas back when I was into other distributions, I couldn't answer for myself, why there were so many upgrade breakages. Now I know why portage breaks so rarely and it's exactly the fact it's doing it slowly but safely.
And that's not the typical case either.
You either emerge a small amount of packages or wait for compile jobs, so it's not noticeable.
And one emptytree emerge as part of a migration is not something that's worth the time invested into improving the time of emerging binary packages. You'll forget about it soon enough.
@pingtoo, look at the sentence above.
Best Regards,
Georgi |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1248 Location: Richmond Hill, Canada
|
Posted: Tue Jul 16, 2024 2:36 pm Post subject: |
|
|
logrusx,
Understood and complete agree with your point.
In fact for my Gentoo practice, I don't update my system until I have have new need. Or I will just build a new image from scratch when I feel my system is too out of date. unlike some that do frequent update I see my usage is not to concern of how each application up to day, but if the application serve my need. if application function correct and I don't need new feature I see no reason to update. I see security not in the form of making individual point secure, i am more toward to make sure the outer layer most is secure.
My idea of study just academic. The questions are can it be make even faster? is there a bottle neck? can the bottle neck be overcome? Can it be done without complete rewrite? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Tue Jul 16, 2024 3:41 pm Post subject: |
|
|
pingtoo wrote: |
My idea of study just academic. The questions are can it be make even faster? is there a bottle neck? can the bottle neck be overcome? Can it be done without complete rewrite? |
I believe this theme has come up many times through the years and I think the answer is no, considering the current state of portage. Also I have very strong trust we have very good developers working on it and I wouldn't even attempt to improve what they've done. If I knew Python I would dig into it, but I don't feel very keen on learning it, so... :)
Best Regards,
Georgi |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9609 Location: beyond the rim
|
Posted: Thu Jul 18, 2024 11:30 am Post subject: |
|
|
pingtoo wrote: | My idea of study just academic. The questions are can it be make even faster? is there a bottle neck? can the bottle neck be overcome? Can it be done without complete rewrite? |
You certainly wouldn't need a complete rewrite, but you'd probably have to sacrifice something (flexibility, stability, compatibility, maintainability, ...). E.g. if the mentioned /var/db/pkg lock is a major obstacle (relatively speaking) you could remove or redesign it to be more granular, but that might increase the risk of data corruption or other hard to identify problems later on. Similar reason why python has the GIL to this day despite numerous attempts to get rid of it.
As for identifying the potential bottle neck, going by the emerge.log extract above it would seem that there is a significant amount of time spent in phase setup and teardown. So not really doing anything, but just checks and cleanup that may or may not be necessary in the majority of cases, but has to be done anyway for safety. Another idea would be to add logic to identify which phases actually need to be executed for a given ebuild (in particular virtuals and alike) and outright skip the rest, but that is much more complex and error-prone than you'd expect. |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1248 Location: Richmond Hill, Canada
|
Posted: Thu Jul 18, 2024 2:13 pm Post subject: |
|
|
Genone wrote: | pingtoo wrote: | My idea of study just academic. The questions are can it be make even faster? is there a bottle neck? can the bottle neck be overcome? Can it be done without complete rewrite? |
You certainly wouldn't need a complete rewrite, but you'd probably have to sacrifice something (flexibility, stability, compatibility, maintainability, ...). E.g. if the mentioned /var/db/pkg lock is a major obstacle (relatively speaking) you could remove or redesign it to be more granular, but that might increase the risk of data corruption or other hard to identify problems later on. Similar reason why python has the GIL to this day despite numerous attempts to get rid of it.
As for identifying the potential bottle neck, going by the emerge.log extract above it would seem that there is a significant amount of time spent in phase setup and teardown. So not really doing anything, but just checks and cleanup that may or may not be necessary in the majority of cases, but has to be done anyway for safety. Another idea would be to add logic to identify which phases actually need to be executed for a given ebuild (in particular virtuals and alike) and outright skip the rest, but that is much more complex and error-prone than you'd expect. |
Thank you very much for your input.
This is exactly I am trying to identify, except I would like to have defined way (as some sort of profiling tool) that would give out the information on a typical run for a given system.
To be able to identify the effort is one of the key I am thinking about because only then we know if it worthy to spend it. for example if we did (with some magic effort) that found a solution to one bottle neck, yet that implementation may call for a big upgrade on the user end, Assume the gain can only amplify on a relative powerful machine than it may not be worthy due to the gain as whole (to Gentoo community) is small.
Of cause above is just an assumption and that is why I was thinking some effort need to spend on profiling a typical run in order to learn should any effort spend on it.
Come to think of it, is a central database like Portage File List concept good? collect machine type, number of packages built on the given run, time spend on each phase, etc... |
|
Back to top |
|
|
hoegger n00b
Joined: 06 Apr 2008 Posts: 50
|
Posted: Thu Jul 18, 2024 9:39 pm Post subject: |
|
|
On a 22-core Xeon E5 with 128 GB RAM I have observed 44 threads fluctuating around 90% while emerging chromium.
While emerging opencv, most of the cores did barely anything. |
|
Back to top |
|
|
wanne32 n00b
Joined: 11 Nov 2023 Posts: 69
|
Posted: Fri Jul 19, 2024 1:27 pm Post subject: |
|
|
Usually the reaction times of the disks are the Problem. Not so much throughput. Also RAM is today often a much bigger bottleneck than the CPU. This is the reason why compiling on an modern M2 Pro-Mac is often by a decent factor faster than on an old 2X-Core DDR4/Sata server. Since its RAM has at least 2 times longer access times and its thorughput is often smaller by an even bigger factor. Sata-Disks do much worse. So you are often maxing out your hardware. Its just the Memory an not the CPU that is on 100%.
But since waiting for tmpfs (RAM) counts as CPU usage on most systems you should still see your CPU at 100%. How do you messure overall CPU-Usage? I get usually >>90% CPU usage when I am using tmpfs. If you set your load average to 31 and your parallel processes per CPU also to 31 it is more or less clear that it rarely starts a second process. This will hamper your CPU-utilisation but maybe not so much your compile time since more cache hits will compensate for the idle time.
But also compiling it on only 32G tmpfs sounds scarce for bigger things like firefox/kde/chromium. What are you compiling? Can it be that you are satisfying your internet connection since you are compiling a lot of big things without much compiletime?
Also llvm/clang are these packages that usually also stop my CPU running on max. The compilation process doesn't seems very parallel and a lot of other packages depend on it. So you can not go on with compiling other packages.
Edit: my 100,0000th post! |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Fri Jul 19, 2024 2:16 pm Post subject: |
|
|
wanne32 wrote: | Usually the reaction times of the disks are the Problem. |
I'll only address this as the rest is completely irrelevant. First, most of it happens in memory buffers, so that makes it redundant to use tmpfs if you have decent amount of memory. Second, modern SSD's are very fast.
We're talking about binary packages here, so no compilation happens. That's why most of your post is irrelevant.
Best Regards,
Georgi |
|
Back to top |
|
|
wanne32 n00b
Joined: 11 Nov 2023 Posts: 69
|
Posted: Fri Jul 19, 2024 4:37 pm Post subject: |
|
|
logrusx wrote: | First, most of it happens in memory buffers, so that makes it redundant to use tmpfs if you have decent amount of memory. | First and foremost: They solve not much since make waits until data is written to disk. So only reads get an speedup. Just try it: Compile llvm or similar on disk and on tmpfs. Only really big things like compiling chromium on -O3 won`t profit that much. But even there the speedups are considerable.
Quote: | Second, modern SSD's are very fast. | Usual modern SATA-SSDs have access times of 0.11ms or 1100000ns. Usual access times for Memory are 11ns. So just 100000 times slower – so y4s, very fast!
Quote: | We're talking about binary packages here | Where do you see that?
@therik: you are not in an container/VM or similar? |
|
Back to top |
|
|
logrusx Advocate
Joined: 22 Feb 2018 Posts: 2420
|
Posted: Fri Jul 19, 2024 4:50 pm Post subject: |
|
|
wanne32 wrote: |
Quote: | We're talking about binary packages here | Where do you see that? |
Read the whole discussion, don't just drop in. The rest I'm not commenting on, you should know what is relevant and what significant.
Best Regards,
Georgi |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9824 Location: almost Mile High in the USA
|
Posted: Fri Jul 19, 2024 6:56 pm Post subject: |
|
|
Lol I was inspecting
Code: | # qlop -mvt virtual/* |
and I notice
Code: | 2023-02-25T21:23:15 >>> virtual/editor-0-r4: 19'19" |
I don't think it really took 20 minutes, but it was probably waiting on a lock for 19 minutes... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
|