View previous topic :: View next topic |
Author |
Message |
Massimo B. Veteran
Joined: 09 Feb 2005 Posts: 1820 Location: PB, Germany
|
Posted: Thu Nov 17, 2022 8:54 am Post subject: distcc via ssh |
|
|
Hi,
https://wiki.gentoo.org/wiki/Distcc is outdated. It still talks about pump mode eventhough portage has dropped this feature long time ago. Anyway.
I'm trying to get distcc working with ssh. I know this is slower, but makes networking easier and more secure.
I know the approach by using a local ssh tunnel and make distcc connecting to localhost. This is a bit uncomfortable and the local tunnel must be started first. AFAIK distcc should support ssh natively by adding a "@" in front of a host in /etc/distcc/hosts.
Adding @ breaks the distcc distribution. Using DISTCC_VERBOSE="1" for the emerge, I found this in the output and I don't know why:
Code: | distcc[7882] (dcc_remove_disliked) remove @gentoo-mb/10,lzo from list |
ssh login from local root to the remote root works without a password. Is the root user used for ssh at all?
This is my local distcc configuration:
I have disabled pump and removed ,cpp. I have removed the --randomize because not all of the remote hosts are always available.
/etc/distcc/hosts: | #--randomize
--localslots=18
--localslots_cpp=24
localhost/8
10.185.40.112/10,lzo
mobalindesk/10,lzo
#@mobalindesk/10,lzo |
/etc/bash/bashrc.d/my_emerge-distcc.sh: | function emerge-distcc () {
DISTCC_FEATURES="distcc"
DISTCC_MAKEOPTS="-j28 -l36"
# explicitly for distcc (resolve-march-native --keep-identical-mtune)
DISTCC_CFLAGS="-march=tigerlake -mabm -madx -maes -mavx -mavx2 -mavx512bitalg -mavx512bw -mavx512cd -mavx512dq -mavx512f -mavx512ifma -mavx512vbmi -mavx512vbmi2 -mavx512vl -mavx512vnni -mavx512vp2intersect -mavx512vpopcntdq -mbmi -mbmi2 -mclflushopt -mclwb -mcx16 -mf16c -mfma -mfsgsbase -mfxsr -mgfni -mlzcnt -mmmx -mmovbe -mmovdir64b -mmovdiri -mpclmul -mpku -mpopcnt -mprfchw -mrdpid -mrdrnd -mrdseed -msahf -msha -mshstk -msse -msse2 -msse3 -msse4.1 -msse4.2 -mssse3 -mtune=tigerlake -mvaes -mvpclmulqdq -mxsave -mxsavec -mxsaveopt -mxsaves --param=l1-cache-line-size=64 --param=l1-cache-size=48 --param=l2-cache-size=24576"
DISTCC_EMERGEARGS=""
CFLAGS="$DISTCC_CFLAGS" CXXFLAGS="${CFLAGS} -fvisibility-inlines-hidden" FEATURES="$FEATURES $DISTCC_FEATURES" MAKEOPTS="$MAKEOPTS $DISTCC_MAKEOPTS" emerge $DISTCC_EMERGEARGS "$@"
} |
_________________ HP ZBook Power 15.6" G8 i7-11800H|HP EliteDesk 800G1 i7-4790|HP Compaq Pro 6300 i7-3770 |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Thu Nov 17, 2022 9:28 am Post subject: |
|
|
Emerge by default switches to user portage for compilation phase, so that's probably the user that must be able to use ssh |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9617 Location: beyond the rim
|
Posted: Thu Nov 17, 2022 9:34 am Post subject: Re: distcc via ssh |
|
|
Massimo B. wrote: | Is the root user used for ssh at all? |
Depends. I think these days FEATURES=userpriv is enabled by default, so compiling is performed with UID=portage, not UID=root. Also there are a bunch of new sandbox features since I last looked at portage that may interact with distcc.
First step should be to test distcc outside of portage to ensure it works at all. And only then adjust the config to make it work also within portage. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Nov 18, 2022 2:38 am Post subject: |
|
|
Distcc can easily saturate the network connection. It creates a lot of traffic. On the other hand it's nothing secret about it. I doubt you need to encrypt anything since an attacker would have no use for sniffing it. Such as it is, ssh has compression but the emphasis is on security over compression. It's not a good wrapper for distcc. It's not bad either, but it's not ideal.
Personally I prefer vtun. Vtun is a lightweight tunnel that uses tun/tap drivers, can use zlib/lzo for compression, and also has various methods for encryption, even though I do not use those. My distcc servers are both in my local network and over the internet. I managed to lower my traffic by using it. It also plays nice with bridge connections or bonded connections. For example, in my case, I use a bridged bonded connection (made out of 4 tunnels) between 2 internet locations to route traffic as though they are in the same location and over that comes distcc. The reason for using bond is quite simple and that is to use severals cpu cores to compress/uncompress that stream of data. When a new distcc stream is started, that is going to the vtun server/client, is broken into 4 parts, compressed in 4 cores, then it moves to the other vtun server/client (over the internet) which turns it again into 1 stream and goes to the distcc server, compiles it, and returns the result over the same way. Using this method is quite a bit faster than having a single tunnel which is limited to a single core (compression).
On the other hand, ssh wont have this problem because each individual distcc stream will be its own ssh connection thus each with his own core, but still, it must IMHO have a bit more overhead than needed. I could provide more information if you are interested. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22870
|
Posted: Fri Nov 18, 2022 1:19 pm Post subject: |
|
|
man distcc: | TCP connections should only be used on secure networks because there is
no user authentication or protection of source or object code. | The source may not be secret, but there is also the question of whether the server adequately limits who can run programs on it and what they can run. When the connection is transported over ssh, only clients with a valid ssh authentication can run anything. When the connection is transported without ssh, you must configure the distcc daemon to enforce any access controls you need. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Nov 18, 2022 1:24 pm Post subject: |
|
|
Hu wrote: | The source may not be secret, but there is also the question of whether the server adequately limits who can run programs on it and what they can run. When the connection is transported over ssh, only clients with a valid ssh authentication can run anything. When the connection is transported without ssh, you must configure the distcc daemon to enforce any access controls you need. |
That should go without saying (ACL) + firewall. Not like exposing ssh (to the internet) is a great idea. And I keep my vtun in the high 60000+ ports. You have to run a complete portscan to find it. |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22870
|
Posted: Fri Nov 18, 2022 2:54 pm Post subject: |
|
|
Yes, a firewall is a good first step. Do you prevent other users on the distcc client machine from connecting to the distcc daemon and running unauthorized commands from it? |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Nov 18, 2022 3:07 pm Post subject: |
|
|
Hu wrote: | Yes, a firewall is a good first step. Do you prevent other users on the distcc client machine from connecting to the distcc daemon and running unauthorized commands from it? |
Well, other than ACL, there are no other users. It's just me. Only my ssh key is allowed to connect to any of those machines at all. I always doubted the security of distcc server (and other things) so it's only accessible from that virtual network hidden behind vtun (it only binds to 192.168.x.x), which in turn is behind a firewall. It is possible to sniff that network if you are my ISP, or like security services in my own country, but there's nothing to see. And no way to get to it. Vtun itself can use ssh to encrypt data, but I thought that is overkill.
Anyway, leaving vtun aside for a moment, as far as I can tell, distcc runs as a normal user, and has no shell. I didn't see any news or advisories or anything to suggest people use it to break into systems. If it would be completely exposed to the internet, people could just flood your connection or overkill your cpu, but do you know something I don't?
EDIT: to make it clearer. Most sensitive services are only available in the demilitarized network. 192.168.smth, different from anything else. Only available in vtun, after you passed the firewall. distcc, sql, ssh, all that juicy stuff. Only mail, web and DNS exposed to the internet. The russians and the chinese are killing me with their hacking attempts. And the dutch. I don't know why the dutch, but I hate Serverion. I mailed them countless times. I am thinking of reporting them to the authorities. |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Fri Nov 18, 2022 4:45 pm Post subject: |
|
|
Talking about security and performance, ssh is really really slow, so... Why not use a vpn instead of ssh?
Say, wireguard?
Encrypted, fast, authenticates with keys, and as a bonus point: tolerates network hiccups. |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Nov 18, 2022 4:50 pm Post subject: |
|
|
Wireguard is point2point. Not bridgeable. Only works with one core. If you have VMs or multiple machines in each network, vtun is more suitable
And we loop right back to what I said before
I don't know how many of you tried vtun, but it's an incredible piece of engineering. Bridge, bond, ssh, encryption, compression (lzo and zlib). |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Fri Nov 18, 2022 5:22 pm Post subject: |
|
|
Lets imagine an example. Lets say you have at work a server, running a few VM's that is remote.
OK. On that server, you could make a network bridge. I always prefer bridges over nat in VM context, but I am not talking about THAT bridge. You know, the one that holds eth1. eth0 goes to internet, eth1 goes to the local network, and naturally, you have to put eth1 in a bridge to allow the VM's to live in that network. I am talking about a second bridge, the demilitarized network. And in that network, you allow IP's from your own home network. And wouldn't it be great if all those nice VM's would just be at home. And that's exactly what I am talking about.
A bridge is like an ethernet switch. You can add any number of connections to it and they all see each other as if they are in the same switch. Problem is, not all types of connections are bridgeable. Tap yes, tun no. So things like wireguard, openvpn, the 3 swans (freeswan, strongswan, openswan) are tun, not tap. So no.
So a second bridge. Each VM with 2 virtual network cards. Not necessary, but easier this way. One interface to the normal network (eth1), one to the virtual network. Which is also plugged by vtun. Which leads to another bridge, at home. So pinging from a computer at home to a VM on the server would look like just first hop, since it's all one network.
But here is the amazing part. Most tunnels/VPNs are just one stream, one process, one core. You can easily make more traffic than a core can compress. And here is where the bond comes in. You can make more than one tunnel, and bond them into a single connection. Yes, tap connections are also bondable. That means 1 stream, X processes, X cores. Same bridge. And also, yes, you can bridge a bond connection. And not peer2peer, but rather network 2 network.
I am insisting on this because distcc is a prime example why you would do this. Distcc creates a lot of traffic and compressing that traffic would be very useful even on gigabit networks. Especially when you have > -j50 on each end. And one core can't handle all that traffic. And as far as I know, there aren't many ways (popular at least) solutions to make that happen. AFAIK vtun is just an inteligent wrapper for things like tap/bond/bridge. But it's a good wrapper.
And again, if you like the extra layer of security, you could wrap tap/bond connections through a ssh. That should give peace of mind. But it's the bond aspect of it that makes it awesome, because, each connection is handled by a different cpu core. Thus... more bandwidth, less overhead. I know bond wasn't made for this purpose, but it serves it pretty well, and again, AFAIK, I am the only idiot that tried this. And I'm happy with it. |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Fri Nov 18, 2022 11:02 pm Post subject: |
|
|
Quote: | And again, if you like the extra layer of security, you could wrap tap/bond connections through a ssh. That should give peace of mind. But it's the bond aspect of it that makes it awesome, because, each connection is handled by a different cpu core. Thus... more bandwidth, less overhead. I know bond wasn't made for this purpose, but it serves it pretty well, and again, AFAIK, I am the only idiot that tried this. And I'm happy with it. | And what are you going to compile with when all your cores are busy moving data around?
It must have been a fun exercise, but it seems to me that you focused to hard on whether you could do it or not that you forgot to think whether you should.
I was unable to push more than 20 Mbps over a single ssh connection (moving huge amounts of data between storage servers). Launching 20 parallel sessions helped a bit, but I was just moving a few dozens TB from an otherwise idle server in one location to another otherwise idle server in a different physical location.
Yes, idle servers.
Anyway, I think that compiling stuff will take longer than moving things around, so being limited to a single core for managing the network part is not that big of a deal. Just set jobs to 3xcpu so it won't run out of tasks due to latency - at this point it's a matter of personal preference. However, the more time moving data around takes, the more efficient it must be to keep distcc beneficial. So, in-kernel is better than user-space in this case |
|
Back to top |
|
|
axl Veteran
Joined: 11 Oct 2002 Posts: 1146 Location: Romania
|
Posted: Sat Nov 19, 2022 5:11 am Post subject: |
|
|
It doesn't work well for file transfers, especially files that are already compressed. When I first started with distcc I could see that the internet line was maxed out, while the cpu was mostly idle at the other end, meaning the line wasn't enough to take full advantage of the cpu. With a single compressed connection I could see one core maxed out because of compression, but neither the cpu nor the line maxed out. With my method, I can now see mostly both maxed out. Depends what you compile and how you setup compression. If you increase the compression rate too much it becomes too laggy. I use lzo:4 which I find gives best balance. For compiling C, you mostly do not have enough time to draw any meaningful conclusion, but where it all shines is when you do c++. Things like webkit-gtk, chromium, clang, llvm, rust. That kind of stuff. Like I said, there are over 50 cores on each end of the line and using 2 or 4 or 8 for that kind of multiplexing is no issue because there still remains a whole lot for compiling as well. And those packages can be compiled in half the time, which is better. I also have a bunch of arms and arms64 which can use the remote intels to compile for them, which is very convenient as well, using either clang or crossdev toolchains built for that architecture. |
|
Back to top |
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|