Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Tinkering with distcc (aka: gief moar green plz)
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Ralphred
l33t
l33t


Joined: 31 Dec 2013
Posts: 709

PostPosted: Mon Jan 27, 2025 5:08 am    Post subject: Tinkering with distcc (aka: gief moar green plz) Reply with quote

I have 4~6 machines all running distcc (one of the most unintuitive words to type without thinking IMHO), they all "export" /var/tmp/portage/.distcc via NFS, mounted on my "admin" machine under /mnt/distcc/<hostname>, with a wrapper for distccmon-gui to set the correct DISTCC_DIR= on invocation. This has worked for a long time (switched on, off, or tinkered with to respect the "critical role" needs of the compilation hosts) and "back in the day" I would have up to 4 distccmon windows open, all with pretty blocks of green indicating that the "emerge -uDNav"ing machines were getting the help they wanted, anyone who's used distccmon-gui understands the meaning and context of the "long and pretty blocks of green".
Fast forward to today where there is no pump support anymore and I have better hardware; I'm looking at a situation where I get a few "short blocks of green£ - Not a problem per se, but is it "sub-optimal".

Preamble dispensed with: I've been attempting to increase "offload" by tinkering with
make.conf:
EMERGE_DEFAULT_OPTS="--jobs J --load-average K"
FEATURES="${FEATURES} distcc"
MAKEOPTS="-jX -lY"
This tinkering has resulted in more blocks of green than pre-tinkering, but less overall green than there used to be. My thinking and settings:
  • portage needs "local resources" to set-up an individual internal "ebuild <some ebuild> merge" job
  • a job needs local resources to "ebuild <some ebuild> pretend ... configure" before compiling
  • a job needs local resources to "ebuild <some ebuild> compile"
  • a job needs local resources to "ebuild <some ebuild> <everything after compile>"
  • make needs local resources during "ebuild <some ebuild> compile" to manage "parallel compile tasks"
  • make needs resources to call "<compiler> <finally do some compiling now>"
For a rhetorical host with 4 cores wanting to update, and a distccd host with 40 cores available to "do work", I set:
/etc/distcc/hosts:
local_host/1 compile_host/40
make.conf:
EMERGE_DEFAULT_OPTS="--jobs 4 --load-average 4"
FEATURES="${FEATURES} distcc"
MAKEOPTS="-j11 -l1"
My questions:
  • Have I correctly interpreted the synergy between <portage and it's jobs>, <make and it's jobs>, and distcc?
  • Is there something in this config unnecessarily creating a "bottleneck"
  • Is distcc aware enough of it's other instances that I can set -j40 and the 'compile_host/40' will prevent the (theoretical) possibility of 160 compile tasks being sent to compile_host?
  • would it be better to exclude local_host/1 from /etc/distcc/hosts entirely (one localhost thread always shows up distccmon, even when it's referred to as <localhost but fqdn>/<threads> in /etc/distcc/hosts and multiple <localhost but fqdn> threads are observed as used.
It seems to work, and I just want to update my understanding and tune the numbers to account for distcc overhead (as in all the "distcc guide"s).
Ignore my use of $(nproc)/(<has hyperthreading>*2) when setting 'make -l' and '--load-avarage'*; I'm just using simplified numbers for 'PoC maths'.
*This is not particularly relevant, but nevertheless interesting; observation has shown that if my "hyperthreading" processor running <nproc/2 number of threads> does 100 units of 'work', then with <nproc threads> it does up to 160 units of work - quite impressive for "essentially just a software switch being flipped".

If "the forum" can reach some consensus on the maths it'd be great, then we/I can document it "properly" and add examples for sidestepping with /etc/portage/{env,package.env} for problem packages etc.
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1469
Location: Richmond Hill, Canada

PostPosted: Mon Jan 27, 2025 12:12 pm    Post subject: Reply with quote

my understanding is that "distccd" will not spawn more than it is configured. So if you set your helper to spawn 40 either your helper actually have 40 CPUs for distccd to detect or you need to set the --job options for distccd. I think once it reach 40 jobs it will block not doing any more jobs.

I set my /etc/distcc/hosts with first line "--randomize" to let distcc sand jobs not necessary in specific order and put word "localhost" at bottom because my build server does not run distccd

I don't know the right number for job counts and load average but I got a impression that on my emerge command line instead use one set (world or otherwise) create less parallelism than list many packages (long list) on command line. I usually can got up to 30 something emerging line before start seeing the install line.

my goal of using distcc is not necessary for faster build but to utilized more resources available. I want to reduce storage I/O, I want to use more memory. all my nodes are SBC ARM based so individule node does not have lots of memory (4G ~ 8G) and most of them use SD card.
Back to top
View user's profile Send private message
Ralphred
l33t
l33t


Joined: 31 Dec 2013
Posts: 709

PostPosted: Mon Jan 27, 2025 4:38 pm    Post subject: Reply with quote

pingtoo wrote:
and most of them use SD card.
Interesting comment, thought provoking; where does distcc store code locally on compile hosts or is it all just in memory? Adding a tmpfs for distcc to use would certainly drop drive wear if it isn't in memory. I need to read the docs again instead of the distcc guides...
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1469
Location: Richmond Hill, Canada

PostPosted: Mon Jan 27, 2025 4:56 pm    Post subject: Reply with quote

Ralphred wrote:
pingtoo wrote:
and most of them use SD card.
Interesting comment, thought provoking; where does distcc store code locally on compile hosts or is it all just in memory? Adding a tmpfs for distcc to use would certainly drop drive wear if it isn't in memory. I need to read the docs again instead of the distcc guides...


my builder use remove zram for /var/tmp.

I can see distcc sometime store temporary file(s) in /var/tmp/(portage?). As far as I can tell on distccd (helper) nothing were written to storage.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum