View previous topic :: View next topic |
Author |
Message |
Vorlon Apprentice
Joined: 16 May 2003 Posts: 259 Location: East Earl, PA
|
Posted: Sat Oct 31, 2020 11:19 pm Post subject: Distcc How-To |
|
|
Distcc is a fantastic idea: Use the power of multiple computers to speed up compiling programs on a target computer. Gentoo makes this process very easy. Simply install distcc on the "target" computer and the "helper" computer(s), make a few changes to "/etc/portage/make.conf", and go.
There is one HUGE caveat, however. The software environment (processor type and testing-level) must be the same on both the target and the helper, or this system gets very complicated very fast. But there is an easy way to make it work: use a virtual machine for the helper. Here's how.
Definitions - To Keep us Straight
There seems to be a lack of standard naming conventions within the cross-compiling documentation. To keep things straight, here are the definitions used in this How-To, which I think generally follow the conventions used in other Gentoo documents.
Build / emerge / compile - In this How-To, all of these are considered to be synonymous (I know, they're really not, but that's being too pedantic for the purposes of this How-To.)
distcc - "Distributed Cross Compiling" A method of using multiple computers to build programs for another computer, which may have a different architecture. For example, using a multi-core AMD Ryzen system to build programs for a Raspberry Pi. (Please note that bridging different CPUs with fundamentally different architectures is beyond the scope of this How-To. This document assumes that both the Target and Helper are the same basic x86 type of CPU.)
Host - Any computer that compiles programs for another. This is often a synonym for "Helper". Please note that a computer can be a host for itself and other computers at the same time.
Target - The computer that needs the help. The Target is the computer that all the other computers are building software for.
Helper - The computer which compiles programs for another computer. There can be multiple helpers working at the same time to distribute the compiling load. In this How-To, all the Helpers will be VirtualBox virtual machines so that they can EXACTLY match the Target systems in terms of CPU, Tool Chain, and testing/stability (i.e.: "~AMD64" or "AMD64").
Tool Chain - The programs used to compile source code. Typically, this will consist of gcc, libc, binutils, and the kernel. If the required Tool Chain on the Target computer does not match the active Tool Chain on the Helper, the compiled programs may not function, or may function erratically. This incompatibility is a major stumbling block for distributed cross compilation.
Step 1 - Build a Helper virtual machine in VirtualBox
a) Build the Helper virtual machine normally as you would any typical Gentoo system. You do not need to install X or any GUI systems on the Helper. You will need to either set a static IP address for the Helper, or record the assigned IP address for use later.
b) Be sure to set CFLAGS to match the Target system. Set "--march=" to the correct CPU architecture as the Target. Do NOT use "--march=native" on the Helper because the Helper would then build the same CPU architecture as the VirtualBox host instead of the Target system.
Step 2 - Install distcc on all the computers (the Target and the VirtualBox Helpers)
a) Run "emerge distcc"
Step 3 - Enable distccd service on the Helper(s)
a) Edit /etc/conf.d/distccd to enable the subnet where the Target resides so that the Helper will compile for it. The "allow" variable needs to be set. For example, DISTCCD_OPTS="${DISTCCD_OPTS} --allow 192.168.1.0/24" will enable distccd to run for any Target in the 192.168.1.x subnet.
b) Run "rc-update add distccd default" to enable the service to start automatically at boot.
c) Run "/etc/init.d/distccd start" to start the service now.
Step 4 - Update make.conf on the Target
a) Update the MAKEOPTS variable to specify the number of helper cores and the number of local cores.
a1) -j now signifies the total number of threads to use. It is set to the total number of threads for all the helpers (including the target) + 1. For example, if the local computer has 4 cores and there are 2 helpers with 4 cores each, then the parameter would be "-j13"
a2) -l (lowercase "L") is added to indicate the number of threads on the local machine + 1. For our previous example, the parameter is "-l5".
a3) Combine the two parameters in the MAKEOPTS variable. In our example, the line should be MAKEOPTS="j13 -l5"
b) Add the "FEATURES" option by adding a line that says FEATURES="distcc".
Step 5 - Set distcc variables on the Target
a) Run the program distcc-config to set the Helper system(s) using the "--set-hosts" command. Be sure to append "cpp, lzo" to each Helper host. You can add multiple URLs. For example, if the Helpers are at 192.168.1.23 & 192.168.1.54, the command will be distcc-config --set-hosts "192.168.1.23,cpp, lzo 192.168.1.54,cpp, lzo". This creates the file "/etc/distcc/hosts", which lists all the Helper hosts. You can also create this file manually.
Step 6 - Emerge on the Target
a) Emerge packages on the Target as you normally would. You will see distcc messages on the Target as the packages emerge.
Miscellaneous Notes:
• Sometimes distcc induces errors in the build process and the emerge will fail. I have not been able to determine why some packages build and some don't. Instead, I simply emerged as many as possible using distcc, then commented out the "FEATURES" variable and emerged the rest.
• Based on my somewhat limited reading of anecdotal evidence, distributed computing is still something of a black art. It should work, but often doesn't. At least one person has told me they eventually found distcc more trouble than it is worth. Depending on the Target machine, the help may or may not be worth the extra effort. Using distcc, I was able to install Gentoo on an old Pentium III with only 512K RAM. Distcc was critical to that since the Pentium III is actually waaaaay too old and feeble, and 512K RAM is waaaaay too tiny to build Gentoo.
• I recommend making the Helper a VirtualBox virtual machine so you can ensure the Target and Helper have identical settings and Tool Chains. But the Helper can be anything as long as these items match the Target.
• There are ways of building a unique Tool Chain on the Helper without going through all the VirtualBox stuff, but I found creating a new VirtualBox machine a lot easier than trying to figure out the confusing Wiki info.
• There are 2 different monitoring programs mentioned in the Gentoo wikis, but I have never been able to get them to work. Instead, I simply ran htop on the Helpers and watched as the distcc program executed on them.
• Distcc is potentially a security risk because it allows remote machines to execute programs on the Helper. You need to be careful to restrict which computers can use the Helper in the configuration file /etc/conf.d/distccd. _________________ Casey Bralla
Chief Nerd in Residence
The NerdWorld Organisation |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54584 Location: 56N 3W
|
Posted: Sun Nov 01, 2020 10:03 am Post subject: |
|
|
Moved from Installing Gentoo to Documentation, Tips & Tricks.
Its one of these.
distcc just works, ever across architectures. Its key to have identical versions of gcc everywhere.
When jobs are distributed, distcc tells exactly what must be done but not with what compiler version.
The monitoring programs are run on the Target.
DISTCC_DIR must be defined, as that's where they look to see what disstcc is doing.
Not all phases of a build can be distributed, so lots of nothing going on is expected too.
e.g. the target has to do its own preprocessing and linking. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22705
|
Posted: Sun Nov 01, 2020 6:25 pm Post subject: Re: Distcc How-To |
|
|
Vorlon wrote: | • Distcc is potentially a security risk because it allows remote machines to execute programs on the Helper. You need to be careful to restrict which computers can use the Helper in the configuration file /etc/conf.d/distccd. | The helper can also be instructed to restrict which programs it will run. In the daemon's environment, set DISTCC_CMDLIST to a file which lists approved compilers, one per line. This may not be perfect if you assume that the approved compiler can be exploited with the right options, but it narrows the set of available programs considerably. This mechanism could also be used to enforce that only properly qualified compiler names can be used, so that a Target that requests gcc gets a failure, but a Target that requests x86_64-pc-linux-gnu-gcc succeeds. This is helpful for environments where the Target and Helper have different values for CHOST, and thus gcc means different things to each of them. You could go a step further and restrict the allowed program to a specific version of gcc, but that will likely require the Target to set CC/CXX explicitly, as few build systems would automatically use a version-qualified compiler. Many build systems can be readily encouraged to use a CHOST-qualified non-version-qualified compiler. |
|
Back to top |
|
|
Lemon-Lime n00b
Joined: 27 Apr 2023 Posts: 58
|
Posted: Fri Sep 01, 2023 1:39 pm Post subject: |
|
|
NeddySeagoon wrote: | distcc just works, ever across architectures. Its key to have identical versions of gcc everywhere. |
Will distcc work if emerged with different use flags on different machines?
Say for instance a machine has emerged the package with the "hardened" use flag and the other didn't.
Or for instance if the different machines have different useflags for gcc (even if the gcc version is the same).
Will it work properly? _________________ Crazy frog is the artist, not the song |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1266 Location: Richmond Hill, Canada
|
Posted: Fri Sep 01, 2023 2:20 pm Post subject: |
|
|
Lemon-Lime wrote: | Will distcc work if emerged with different use flags on different machines?
Say for instance a machine has emerged the package with the "hardened" use flag and the other didn't.
Or for instance if the different machines have different useflags for gcc (even if the gcc version is the same).
Will it work properly? |
May be answers to my question will help more easier to understand distcc environment.
My question, Can someone definitively define exactly what binaries need to be installed on the "Helper" in order for distcc to work as helper?
My current understand/guess is that the helper only need the gcc (right version off cause) and "as" the assembler. Taking "crossdev" as the way to build a "Tool Chain", crossdev will build "gcc", "binutils", "libc" and kernel header.
So is it necessary to run "crossdev" to build entire "Tool Chain" in order to make a distcc helper environment? this is question.
So in my mind nothing else but gcc (the package) and "as" is needed. And no setting (of any Portage USE flags, or CFLAGS) on Helper will influence the build results on the Target nodes. Am I right? |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22705
|
Posted: Fri Sep 01, 2023 3:47 pm Post subject: |
|
|
Lemon-Lime wrote: | Will distcc work if emerged with different use flags on different machines? | That depends on the flags. Generally, you need distcc on the build machine to cause the remote machine to produce exactly the same object file that would have been produced locally. Therefore, flags that are not relevant to what is produced can be out of sync. Flags that can impact what is produced need to be synchronized. Lemon-Lime wrote: | Say for instance a machine has emerged the package with the "hardened" use flag and the other didn't. | As I read the ebuild for distcc, USE=hardened enables a patch that probably ought to be enabled everywhere, although in practice it seems not to be needed on systems that use a non-hardened gcc. I think it could be safely enabled everywhere, since it just makes distcc more cautious about what it passes to the remote system. Lemon-Lime wrote: | Or for instance if the different machines have different useflags for gcc (even if the gcc version is the same).
Will it work properly? | The goal is that you need to produce the same output. If the mismatched USE flags do not affect that goal, then they can be safely mismatched. Some flags, such as lto or pgo, ought to only impact how well the compiler performs, but not what output it produces. Those can be mismatched. Others may impact how it changes C/C++ source text into GNU as assembly. Those need to be matched. |
|
Back to top |
|
|
Lemon-Lime n00b
Joined: 27 Apr 2023 Posts: 58
|
Posted: Fri Sep 01, 2023 9:11 pm Post subject: |
|
|
Noted Hu! Will give it a try and I'll update this post with my findings.
Thank you so much for your help! _________________ Crazy frog is the artist, not the song |
|
Back to top |
|
|
juliedeville n00b
Joined: 14 Oct 2024 Posts: 35
|
Posted: Fri Oct 25, 2024 3:42 pm Post subject: |
|
|
I followed the instructions, but can't get it working. I think I had it earlier, because I remember seeing distcc messages when merging a package; idk where I went wrong. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22705
|
Posted: Fri Oct 25, 2024 4:24 pm Post subject: |
|
|
juliedeville wrote: | I followed the instructions, but can't get it working. I think I had it earlier, because I remember seeing distcc messages when merging a package; idk where I went wrong. | I think you went wrong by not following Guidelines item #4. You say you followed the instructions, but you show us neither the commands you ran, nor what output they produced, nor how you determined that it is not working. With what you posted, there is little we can do to help you. |
|
Back to top |
|
|
juliedeville n00b
Joined: 14 Oct 2024 Posts: 35
|
Posted: Fri Oct 25, 2024 5:51 pm Post subject: |
|
|
Nvm, I got it. Good guide! In /etc/distcc/hosts make sure to only enable "lzo" because "cpp" activates pump mode, which is deprecated. |
|
Back to top |
|
|
Ralphred l33t
Joined: 31 Dec 2013 Posts: 657
|
Posted: Fri Oct 25, 2024 10:27 pm Post subject: Re: Distcc How-To |
|
|
Vorlon wrote: | Miscellaneous Notes: | As an addition: distccmon is designed to be run on the "client", not the "hosts". I have a number of ostensibly headless "clients" and like to see what's going on when updates etc. are running in the background. I've tried numerous methods, but the following seems to work best:- Export the clients .distcc directory
/etc/exports: | /var/tmp/portage/.distcc 10.0.0.0/24(rw,async,subtree_check,insecure) | Mount the export locally Code: | mkdir /mnt/10.0.0.15_distcc
mount -t nfs 10.0.0.15:/var/tmp/portage/.distcc /mnt/10.0.0.15_distcc | Use this (very lazy) wrapper to launch a gui version of distccmon with `distccmon [host]' ~/bin/distccmon: | #!/bin/bash
DISTCC_DIR="/mnt/${1}_distcc"
grep -q $DISTCC_DIR /etc/fstab && mount $DISTCC_DIR
NAME="DistccMon: $1"
export DISTCC_DIR=${DISTCC_DIR}
distccmon-gui &
until xdotool search --name "distcc Monitor" set_window --name "${NAME}";do sleep 1;done | Once the "clients distcc mount point" is created, you can put /etc/fstab: | [host]:/var/tmp/portage/.distcc /mnt/[host]_distcc nfs rw,bg,timeo=60,soft,noauto,user 0 0 | in fstab and the wrapper becomes even "lower maintenance" as it will automount the nfs export.
Quote: | But why do you feel the need for the GUI version? | Because the cli version is a "snapshot" of what distcc is doing at a given moment, the gui version has about 60 seconds of "history" shown in it's window, so at a glance needs far less attention to make sure things are "working as they should". |
|
Back to top |
|
|
Bob P Advocate
Joined: 20 Oct 2004 Posts: 3374 Location: USA
|
Posted: Sat Oct 26, 2024 1:03 am Post subject: |
|
|
It's been a long time since I ran a distcc compiling farm (is anyone here old enough to remember Stage 1/3 Installs or the Jackass! Project?) It had to be 18-20 years ago...
One of the things that I really liked about the GUI version of the distcc monitor was how much real time status information it gave me regarding the progress of a full system recompile that was distributed across a dozen machines. It was a royal PITA to get distcc up and running but once I finally got it working I found the bird's eye view provided by the GUI monitor to be particularly fun to watch during a long system emerge. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|