distcc via ssh

Massimo B. · Posted: Thu Nov 17, 2022 8:54 am Post subject: distcc via ssh

Hi,

https://wiki.gentoo.org/wiki/Distcc is outdated. It still talks about pump mode eventhough portage has dropped this feature long time ago. Anyway.

I'm trying to get distcc working with ssh. I know this is slower, but makes networking easier and more secure.
I know the approach by using a local ssh tunnel and make distcc connecting to localhost. This is a bit uncomfortable and the local tunnel must be started first. AFAIK distcc should support ssh natively by adding a "@" in front of a host in /etc/distcc/hosts.
Adding @ breaks the distcc distribution. Using DISTCC_VERBOSE="1" for the emerge, I found this in the output and I don't know why:

szatox · Advocate Joined: 27 Aug 2013 Posts: 3583

Emerge by default switches to user portage for compilation phase, so that's probably the user that must be able to use ssh

Genone · Posted: Thu Nov 17, 2022 9:34 am Post subject: Re: distcc via ssh

axl · Veteran Joined: 11 Oct 2002 Posts: 1146 Location: Romania

Distcc can easily saturate the network connection. It creates a lot of traffic. On the other hand it's nothing secret about it. I doubt you need to encrypt anything since an attacker would have no use for sniffing it. Such as it is, ssh has compression but the emphasis is on security over compression. It's not a good wrapper for distcc. It's not bad either, but it's not ideal.

Personally I prefer vtun. Vtun is a lightweight tunnel that uses tun/tap drivers, can use zlib/lzo for compression, and also has various methods for encryption, even though I do not use those. My distcc servers are both in my local network and over the internet. I managed to lower my traffic by using it. It also plays nice with bridge connections or bonded connections. For example, in my case, I use a bridged bonded connection (made out of 4 tunnels) between 2 internet locations to route traffic as though they are in the same location and over that comes distcc. The reason for using bond is quite simple and that is to use severals cpu cores to compress/uncompress that stream of data. When a new distcc stream is started, that is going to the vtun server/client, is broken into 4 parts, compressed in 4 cores, then it moves to the other vtun server/client (over the internet) which turns it again into 1 stream and goes to the distcc server, compiles it, and returns the result over the same way. Using this method is quite a bit faster than having a single tunnel which is limited to a single core (compression).

On the other hand, ssh wont have this problem because each individual distcc stream will be its own ssh connection thus each with his own core, but still, it must IMHO have a bit more overhead than needed. I could provide more information if you are interested.

Hu · Administrator Joined: 06 Mar 2007 Posts: 23277

axl · Veteran Joined: 11 Oct 2002 Posts: 1146 Location: Romania

Hu · Administrator Joined: 06 Mar 2007 Posts: 23277

Yes, a firewall is a good first step. Do you prevent other users on the distcc client machine from connecting to the distcc daemon and running unauthorized commands from it?

axl · Veteran Joined: 11 Oct 2002 Posts: 1146 Location: Romania

szatox · Advocate Joined: 27 Aug 2013 Posts: 3583

Talking about security and performance, ssh is really really slow, so... Why not use a vpn instead of ssh?
Say, wireguard?
Encrypted, fast, authenticates with keys, and as a bonus point: tolerates network hiccups.

axl · Veteran Joined: 11 Oct 2002 Posts: 1146 Location: Romania

Wireguard is point2point. Not bridgeable. Only works with one core. If you have VMs or multiple machines in each network, vtun is more suitable

And we loop right back to what I said before

I don't know how many of you tried vtun, but it's an incredible piece of engineering. Bridge, bond, ssh, encryption, compression (lzo and zlib).

axl · Veteran Joined: 11 Oct 2002 Posts: 1146 Location: Romania

Lets imagine an example. Lets say you have at work a server, running a few VM's that is remote.

OK. On that server, you could make a network bridge. I always prefer bridges over nat in VM context, but I am not talking about THAT bridge. You know, the one that holds eth1. eth0 goes to internet, eth1 goes to the local network, and naturally, you have to put eth1 in a bridge to allow the VM's to live in that network. I am talking about a second bridge, the demilitarized network. And in that network, you allow IP's from your own home network. And wouldn't it be great if all those nice VM's would just be at home. And that's exactly what I am talking about.

A bridge is like an ethernet switch. You can add any number of connections to it and they all see each other as if they are in the same switch. Problem is, not all types of connections are bridgeable. Tap yes, tun no. So things like wireguard, openvpn, the 3 swans (freeswan, strongswan, openswan) are tun, not tap. So no.

So a second bridge. Each VM with 2 virtual network cards. Not necessary, but easier this way. One interface to the normal network (eth1), one to the virtual network. Which is also plugged by vtun. Which leads to another bridge, at home. So pinging from a computer at home to a VM on the server would look like just first hop, since it's all one network.

But here is the amazing part. Most tunnels/VPNs are just one stream, one process, one core. You can easily make more traffic than a core can compress. And here is where the bond comes in. You can make more than one tunnel, and bond them into a single connection. Yes, tap connections are also bondable. That means 1 stream, X processes, X cores. Same bridge. And also, yes, you can bridge a bond connection. And not peer2peer, but rather network 2 network.

I am insisting on this because distcc is a prime example why you would do this. Distcc creates a lot of traffic and compressing that traffic would be very useful even on gigabit networks. Especially when you have > -j50 on each end. And one core can't handle all that traffic. And as far as I know, there aren't many ways (popular at least) solutions to make that happen. AFAIK vtun is just an inteligent wrapper for things like tap/bond/bridge. But it's a good wrapper.

And again, if you like the extra layer of security, you could wrap tap/bond connections through a ssh. That should give peace of mind. But it's the bond aspect of it that makes it awesome, because, each connection is handled by a different cpu core. Thus... more bandwidth, less overhead. I know bond wasn't made for this purpose, but it serves it pretty well, and again, AFAIK, I am the only idiot that tried this. And I'm happy with it.

szatox · Advocate Joined: 27 Aug 2013 Posts: 3583

axl · Veteran Joined: 11 Oct 2002 Posts: 1146 Location: Romania

It doesn't work well for file transfers, especially files that are already compressed. When I first started with distcc I could see that the internet line was maxed out, while the cpu was mostly idle at the other end, meaning the line wasn't enough to take full advantage of the cpu. With a single compressed connection I could see one core maxed out because of compression, but neither the cpu nor the line maxed out. With my method, I can now see mostly both maxed out. Depends what you compile and how you setup compression. If you increase the compression rate too much it becomes too laggy. I use lzo:4 which I find gives best balance. For compiling C, you mostly do not have enough time to draw any meaningful conclusion, but where it all shines is when you do c++. Things like webkit-gtk, chromium, clang, llvm, rust. That kind of stuff. Like I said, there are over 50 cores on each end of the line and using 2 or 4 or 8 for that kind of multiplexing is no issue because there still remains a whole lot for compiling as well. And those packages can be compiled in half the time, which is better. I also have a bunch of arms and arms64 which can use the remote intels to compile for them, which is very convenient as well, using either clang or crossdev toolchains built for that architecture.