View previous topic :: View next topic |
Author |
Message |
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Sat Mar 10, 2018 5:27 pm Post subject: VirtualBox: Running skinny VMs on Windows for distcc service |
|
|
One of the best things about the Gentoo distribution is that it’s 100% source and highly customizable.
One of the worst things about the Gentoo distribution is that it’s 100% source and you have to compile everything.
So how do you throw more CPU cores at the emerge compilation?
With Virtualbox, build a really skinny distcc VM, and run them headless on Windows machines.
A VM that is idle consumes a little bit of memory (about 40 MB – your mileage may vary) and virtually no CPU cycles yet is at the ready to perform work when called on.
In my home network, I have 4 physical 64-bit dual core Gentoo machines running distcc as well as 3 64-bit dual core Windows machines running the skinny distcc host VMs for a total 14 CPU cores in the distcc network. So my /etc/portage/make.conf contains DISTCC_HOSTS that lists them all, and MAKEOPTS has -j 24 -l14. The -j is calculated by (N CPUs – 2) * 2. The -l value is the load limiter, so if you distcc VM hits a 14 load, it won’t be sent any more distcc jobs until that load comes down (at least this is my understanding – post a correction if I’m wrong, I’d welcome it).
The first task is to build the first distcc VM (you only build it once, and then copy the VM to your Windows hosts). This is just like you’d build out a regular Gentoo machine, following the Gentoo hand book. There’s virtually no need for a high number of kernel modules, just enough to talk to the devices that Virtualbox presents to the VM (it should be very similar to Optimizing the kernel for VMware). Usually allocating 2048 MB to the VM, along with swap space, is sufficient to run the VM and distcc. I configure a 200 GB VDI for the VMs hard drive. VDI files grow as needed, and when all said and done is a reasonable 10 GB VDI file (your mileage may vary). The VM’s build culminates with emerging the distcc and ccache packages.
Once the VM is completed and tested to accepting distcc requests from other Gentoo machines on the network, shut it down, and copy the DVI file to all the Windows machines that’ll be hosts for the VM.
On each of the Windows machines configure the VM, changing the IP address and system names, so there are a few files that need revision: /etc/conf.d/net, /etc/conf.d/hostname for sure (might be others if you have the VMs offering other services).
Configuring the VMs on the Windows machines to start up on boot is left to the reader. I wrote a .Net program that registers itself as a Windows service to start and stop the VM, but it’s not really code that’s ready for prime time, from my view. But it solved my problem. There are other tools to do this discussed in the Virtualbox forums that are free to download.
If a concern is that sometimes the Windows machines aren’t available on the network, not to worry. distcc ends up running the compilation locally if it can’t hand off the compilation to a distcc server.
Also, the distcc VMs gcc should stay in lock step with the rest of the Gentoo’s gcc version. But since you can easily build binary packages and then emerge them, you only have to compile GCC once. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Sun Mar 11, 2018 2:18 am Post subject: |
|
|
This is an alternative?
https://forums.gentoo.org/viewtopic-t-66930-start-0.html
Not sure how well maintained this path is, however; even if cygwin is slower, at least virtualization (and the spectre/meltdown problems) won't have to be paid multiple times. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Sun Mar 11, 2018 4:46 am Post subject: |
|
|
While I have run Cygwin in the past, but I never tried to setup a Cygwin or colinux distcc host for gentoo, so I can't judge how easy or difficult it would be.
What is nice is that headless distcc VMs are transparent to the Windows user (my wife didn't even know that I had set it up). |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Sun Mar 11, 2018 4:56 am Post subject: |
|
|
Yeah, threw up colinux quick to take a look at it, but didn't like it much.
Seriously, why have a hobbled linux when you can have a real Gentoo VM? And a 64 bit one at that? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Sun Mar 11, 2018 5:46 pm Post subject: |
|
|
because a full VM is "costly" in both RAM and CPU cycles. I wouldn't called cygwin "hobbled linux" ... it's running natively under Windows, Windows is dealing with memory/context swaps the best way it knows how to, and no memory/cpu cycles are wasted in a scheduler running in a scheduler.
It just doesn't look like Linux/Unix, that's about it. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Sun Mar 11, 2018 7:28 pm Post subject: |
|
|
eccerr0r wrote: | because a full VM is "costly" in both RAM and CPU cycles. I wouldn't called cygwin "hobbled linux" ... it's running natively under Windows, Windows is dealing with memory/context swaps the best way it knows how to, and no memory/cpu cycles are wasted in a scheduler running in a scheduler.
It just doesn't look like Linux/Unix, that's about it. |
While true, we've long since passed the days were machine resources were valued more than people's time and effort.
Given the capabilities of the typical Windows machine these days, (>2.2 GHz, dual core 64 bit, >8 GB RAM, and now SSDs even more prevalent), there's lots of cycles there to be had.
Rationalizing the cygwin or colinux gcc / tool chain environment to the full Gentoo system such that the object files produced by cygwin or colinux can be used by the full Gentoo one is a pitfall, that didn't seem straight forward to me at all.
But the 'P' in 'PC' is personal, so to each their own. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Mon Mar 12, 2018 12:28 am Post subject: |
|
|
You should time with and without the windows helper.
You'll notice that if the helper is slow enough, it's not worth it to even have the helper... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Mon Mar 12, 2018 12:53 am Post subject: |
|
|
eccerr0r wrote: | You should time with and without the windows helper.
You'll notice that if the helper is slow enough, it's not worth it to even have the helper... |
I have my solution and I'm satisfied with it. But don't let me stop you from exploring it. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Mon Mar 12, 2018 2:12 am Post subject: |
|
|
Don't be surprised if you find out the helper doesn't really help after all is done and through. From my long experimentation with distcc, having a slow helper whether it's due to network latency or slow compilation is a net wash or even net detriment for small file compilation.
Just warning your solution may not be found as optimal as you might think. _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Mon Mar 12, 2018 2:36 am Post subject: |
|
|
eccerr0r wrote: | Don't be surprised if you find out the helper doesn't really help after all is done and through. From my long experimentation with distcc, having a slow helper whether it's due to network latency or slow compilation is a net wash or even net detriment for small file compilation.
Just warning your solution may not be found as optimal as you might think. | Given your distcc experience, do you think that is due to the limitations of the VM in CPU and RAM? I'm wondering if use of Cygwin (or WSL?) would be "more efficient" in allowing more usage of the system's resources, along with less abstraction getting in the way. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Mon Mar 12, 2018 3:06 am Post subject: |
|
|
The major problem with cygwin is compiling new versions of gcc that match that of your Gentoo boxes ... I don't know how well the environment works in building gcc, especially 'bleeding edge' compilers and whether cygwin is even a "supported target". As I have not really experimented much with cygwin (due to lack of worthwhile windows boxes) I can't give any direction on this except the theory.
The major problem with VMs of any sort is that it needs to emulate all the privileged instructions. One might think that gcc does not use any, but keep in mind the privilege changes when it accesses the disk (and in the case of distcc, network) as well as the VM scheduling overhead that's double scheduled by the windows scheduler. Plus I don't know of the penalties that the meltdown/spectre mitigation will do with VM, so it would be most ideal not to have to emulate them.
It's been said that good VMMs can get about 90% of native speed - but this is only memory execution with minimal disk IO. I've never seen good results with disk/network IO on VMMs. I personally use KVM QEMU and get nowhere near this 90% and overall speeds have been closer to the 70% mark (and I've heard reports that the meltdown/spectre mitigation is said to drop it to 50%...) _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Mon Mar 12, 2018 3:07 am Post subject: |
|
|
pjp wrote: | eccerr0r wrote: | Don't be surprised if you find out the helper doesn't really help after all is done and through. From my long experimentation with distcc, having a slow helper whether it's due to network latency or slow compilation is a net wash or even net detriment for small file compilation.
Just warning your solution may not be found as optimal as you might think. | Given your distcc experience, do you think that is due to the limitations of the VM in CPU and RAM? I'm wondering if use of Cygwin (or WSL?) would be "more efficient" in allowing more usage of the system's resources, along with less abstraction getting in the way. |
Maybe, if you can resolve the differences in the gcc and tool chain so they are object compatible.
From what I've read up on, VMs generally do pretty well with compute, but less well with disk IO performance. |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Mon Mar 12, 2018 3:22 am Post subject: |
|
|
eccerr0r wrote: | The major problem with cygwin is compiling new versions of gcc that match that of your Gentoo boxes ... I don't know how well the environment works in building gcc, especially 'bleeding edge' compilers and whether cygwin is even a "supported target". As I have not really experimented much with cygwin (due to lack of worthwhile windows boxes) I can't give any direction on this except the theory.
The major problem with VMs of any sort is that it needs to emulate all the privileged instructions. One might think that gcc does not use any, but keep in mind the privilege changes when it accesses the disk (and in the case of distcc, network) as well as the VM scheduling overhead that's double scheduled by the windows scheduler. Plus I don't know of the penalties that the meltdown/spectre mitigation will do with VM, so it would be most ideal not to have to emulate them.
It's been said that good VMMs can get about 90% of native speed - but this is only memory execution with minimal disk IO. I've never seen good results with disk/network IO on VMMs. I personally use KVM QEMU and get nowhere near this 90% and overall speeds have been closer to the 70% mark (and I've heard reports that the meltdown/spectre mitigation is said to drop it to 50%...) |
Still, even if it's at 50% loss over theoretical for the 3 Windows hosted VMs on the machines that are dedicated for Windows use primarily, the 50% that you do gain helps over not using them at all.
Hey, I'm not trying to force anyone to anything they don't want to. I'm just sharing what I did, and saying it helps. |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Mon Mar 12, 2018 3:30 am Post subject: |
|
|
That's not exactly the way of thinking about it, it's considering you have say four 2GHz boxes, and you end up basically using a 2GHz box as your host and three 1GHz boxes as your helpers. Couple that with network latency, it does get to a point where the "1GHz" box might not even be worth having in the pool.
Try distccmon-gui and watching it. I've seen many times where the "2GHz" machine has an idle core while it waits for data to come back from the "1GHz" boxes where that 2GHz machine could have been doing the job of the "1GHz" machine instead of waiting for it. If you have enough "1GHz" machines it may help as you have enough of them to wait on, but this depends on what you're compiling/how it's being scheduled by make/ninja/.... _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Mon Mar 12, 2018 4:22 am Post subject: |
|
|
Interesting, thanks. At least until spectre issues are resolved, gcc & toolchain compatibility could be a challenge.
Maybe a VM set for the max CPUs and memory, and then remotely controlling whether or not it is paused or resumed. A 200GB VDI seems large if the host is on an SSD. I think the one I just bought was 250GB (never mind that the system will likely never independently use more than 50GB).
Or even a small distcc/OS partition and remotely controlling which OS is active and boots next. If a native Windows program can set a "next boot only" option, that could work really well. Interesting project options. :) _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Tue Mar 13, 2018 12:57 am Post subject: |
|
|
pjp wrote: | Interesting, thanks. At least until spectre issues are resolved, gcc & toolchain compatibility could be a challenge.
Maybe a VM set for the max CPUs and memory, and then remotely controlling whether or not it is paused or resumed. A 200GB VDI seems large if the host is on an SSD. I think the one I just bought was 250GB (never mind that the system will likely never independently use more than 50GB).
Or even a small distcc/OS partition and remotely controlling which OS is active and boots next. If a native Windows program can set a "next boot only" option, that could work really well. Interesting project options. |
Hmm. Not sure which VDI would grow to 200 GB. My little distcc VM is only a little over 10 GB. None of my VDIs are over 77 GB. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Tue Mar 13, 2018 1:21 am Post subject: Re: VirtualBox: Running skinny VMs on Windows for distcc ser |
|
|
What did you mean with 200 GB VDI in the following section?
eohrnberger wrote: | I configure a 200 GB VDI for the VMs hard drive. VDI files grow as needed, and when all said and done is a reasonable 10 GB VDI file (your mileage may vary). The VM’s build culminates with emerging the distcc and ccache packages. |
_________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Tue Mar 13, 2018 2:15 am Post subject: Re: VirtualBox: Running skinny VMs on Windows for distcc ser |
|
|
pjp wrote: | What did you mean with 200 GB VDI in the following section?
eohrnberger wrote: | I configure a 200 GB VDI for the VMs hard drive. VDI files grow as needed, and when all said and done is a reasonable 10 GB VDI file (your mileage may vary). The VM’s build culminates with emerging the distcc and ccache packages. |
|
Oh. Let me clarify. The internal size for the HD is 200 GB (max storage). The VDI file is only storing the data that you write to it, so around 10 GB. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Tue Mar 13, 2018 5:02 am Post subject: |
|
|
You're referring to dynamically allocated VDI, aren't you? Such that if some event caused spurious logging, it could eventually consume 200GB, correct? _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Wed Mar 14, 2018 2:52 am Post subject: |
|
|
pjp wrote: | You're referring to dynamically allocated VDI, aren't you? Such that if some event caused spurious logging, it could eventually consume 200GB, correct? |
I suppose so. However, there's an option that emulates an SSD, which trims the file system allocation of free space (if I understand it correctly), so I suppose that I would do the following after a spurious logging event eventually consume 200GB:
- Clean up and eliminate the spurious logging event (something's not happy)
- Turn on SSD emulation to trim down the free space from the file system
- Shutdown the VM and run vboxmanage compact on the idle VDI
- resume running the VM
Now, not having done this before, and having logrotate in place to clean up and keep the log space down, I can't really comment on the efficacy of this procedure.
On the other hand, you could create a new VDI, and tar over from the starting VDI to reduce the allocation of the VDI. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Wed Mar 14, 2018 4:28 am Post subject: |
|
|
Interesting, thanks. I'll have to look into that feature. I'm not sure how I feel about VirtualBox manipulating something inside the OS to stop the logging event. For example, how can it know it is a problem as opposed to a desirable event to be logged? _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Wed Mar 14, 2018 4:45 am Post subject: |
|
|
pjp wrote: | Interesting, thanks. I'll have to look into that feature. I'm not sure how I feel about VirtualBox manipulating something inside the OS to stop the logging event. For example, how can it know it is a problem as opposed to a desirable event to be logged? |
I don't think that VBox isn't going to change anything about the logging event, but could address the excess disk allocation resulting from the logging event to keep the host's VDI file from growing overly large.
The configuration issue causing the logging event would still be up to you to resolve, but if the VDI file grows too large, I think there are ways ot squeeze it back down to manageable sizes. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Wed Mar 14, 2018 5:06 am Post subject: |
|
|
Oh, OK. I'd probably just go with a smaller fixed size. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
eohrnberger Apprentice
Joined: 09 Dec 2004 Posts: 250
|
Posted: Wed Mar 14, 2018 5:44 am Post subject: |
|
|
pjp wrote: | Oh, OK. I'd probably just go with a smaller fixed size. |
From what I've seen, there's not all that much penalty for a larger fixed size, but do as you will. |
|
Back to top |
|
|
|