Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Is my idea any good?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  

Is this idea any good?
Yes
7%
 7%  [ 1 ]
No
92%
 92%  [ 13 ]
Other (Please leave a comment, in this case)
0%
 0%  [ 0 ]
Total Votes : 14

Author Message
leyvi
Tux's lil' helper
Tux's lil' helper


Joined: 08 Sep 2023
Posts: 104

PostPosted: Sat Sep 30, 2023 8:49 pm    Post subject: Is my idea any good? Reply with quote

Hello Gentoo community.
Portage is by far the coolest package manager to ever grace us GNU/Linux users with its presence, but it has one flaw:
Its really goddam slow.
I think that I may have a solution, but I'm a high-school student, and I don't have any experience contributing to open-source software, and I would like some feedback on whether or not this idea of mine is practical, feasible, etc.

I was re-emergeing `@world` with the LLVM toolchain, when an idea occurred to me:
What if packages were distributed as LLVM IR/bitcode?
This would probably shrink the file sizes of packages, and thus the download time (a huge issue for me)
but more importantly, it would reduce the amount of time needed to compile and install an application; some
of us use the LLVM toolchain anyway, and not having to use clang, rustc, clang++, etc. and just using the LLVM
backend could reduce the time needed to build packages by quite a bit. In addition, the core principals of portage
would still be there. The IR can still be optimized on the end-user's machine, USE flags can still do their thing, etc.
The only thing that wouldn't be possible is editing the source code before it gets built and installed, but who
does that anyway? Regardless, one could still do that if needed (its open-source, after all).

Some things I'm less sure about, but could still be interesting:
1) Proprietary software could be distributed as LLVM IR, as its a lot like binary executables, which software companies distribute all the time, so things like games could be optimized for performance on the end-user's machine.
2) IR/bitcode is more compressible than source code, so smaller file sizes.
3) Maybe sub components of packages could be separated by USE flag, i.e. base install (USE flag independent), Wayland, X, udev, kde, etc., so that only IR necessary for your USE flags is downloaded, etc.

I think that if this idea is feasible and practical, it could be made into its own distro, as in a fork of Gentoo (this new distro should totally be called Peregrine Linux).
It would be mostly the same as Gentoo, but focused more on speed (as a general policy), oriented around LLVM, and hopefully a bit more accessible to tech novices, and people with less free time in general.
For the most part, portage is versatile enough that most of these ideas could be implemented on Gentoo as a proof of concept of sorts, before being migrated to a dedicated fork.
I would love to participate in such a project.

Sorry if that was a bit of a rant, but it's a general outline, not a complete idea, and I'm tired.
What do you guys think? Leave a vote in the poll below :idea: :
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54681
Location: 56N 3W

PostPosted: Sat Sep 30, 2023 9:38 pm    Post subject: Reply with quote

leyvi,

You are missing a yes but ... option.

Portage is slow as its certainly solving a NP-hard problem and maybe an NP-complete problem, see https://en.wikipedia.org/wiki/NP-hardness.
That means that in theory, portage has to compute all possible solutions to find the best one. The classic example is the travelling salesman problem. I'll let you search for that.

Trying to distribute precompiled binaries, in any form, misses the point of Gentoo. Gentoo is not a distro. It's a toolkit that you use to design and build your very own distro.
You make leyvi LInux :) I have NeddySeagoon Linux and so on. We all use the same tools, we do not have the same installs.
Gentoo is the portage package manager and the ::gentoo ebuild repo. That's all. Everything else is {$UPSTREAM}

To take one example of the design aspect. app-office/libreoffice is affected by USE="accessibility base bluetooth +branding clang coinmp +cups custom-cflags +dbus debug eds firebird googledrive gstreamer +gtk java kde ldap +mariadb odk pdfimport postgres test valgrind vulkan" That's 25 USE flags. If they are all independent that's 2^25 different ways to build that one package.
When you use prebuilt binaries, you don't choose the USE flags settings.

Have a look at Experimental binary package host to see what is already being done.
Its somebody elses Gentoo, not yours. That's not to say its a bad thing. Use it if it does what you need. After all, an operating system and the installed packages are a tool to do a job' not an end in themselves.

As for download times, portage has a mostly forgotten option
Code:
--fetchonly

This allows the downloads to be accomplished while you have access to a fast link and the building done separately with no internet connection at all.
There is even Sneakernet if that helps, so that you fetch on a separate system altogether.
Writing the scripts to automate Sneakernet is left as an exercise for the reader. :)

My vote would be "Yes but the limitations need to be understood"
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22966

PostPosted: Sat Sep 30, 2023 9:55 pm    Post subject: Reply with quote

First, I think you need to quantify what is slow. Are you complaining about (1) the time Portage requires to perform dependency analysis, (2) the time packages take to build once Portage starts the build, (3) the time packages take to install after building completes, (4) other?

For (1), your proposal does nothing to help with this, because Portage still needs to solve the dependency chain before it knows what packages to build.

For (2), distributing packages in a ready-to-build configuration could help. Will your proposal actually make it meaningfully faster? What benchmarks have you run to validate this?

For (3), like (1), your proposal does not help.

For (4), please describe what you think is slow.

With that out of the way, I think this will not work. Packages are distributed in the form upstream prefers, except when that form is so much trouble to manage that the Gentoo developers repackage the archive and point the ebuild to that. So in the general case, implementing this requires convincing the relevant upstreams to distribute the required IR bitcode, or asking the Gentoo developers to repackage every relevant release as IR bitcode.

Next, most build systems are strongly oriented towards compiling source code into object code. If you want to distribute an input that is not source code, you need to adjust the build system to handle that. This is potentially a per-package task, which means it is a huge amount of work. Worse, if not merged upstream and maintained there, it could break with every release that adds new source files or changes how existing ones are built.

leyvi wrote:
The IR can still be optimized on the end-user's machine, USE flags can still do their thing
This is not necessarily true. Sometimes USE flags affect which parts of the program are passed to the compiler, such as by excluding code in a #ifdef block. Can IR bitcode represent arbitrary preprocessor guards?
leyvi wrote:
The only thing that wouldn't be possible is editing the source code before it gets built and installed, but who
does that anyway? Regardless, one could still do that if needed (its open-source, after all).
This is very common. At a mininum, every ebuild that applies any source code patch to the upstream archive qualifies for this, unless you posit that every such ebuild should instead abandon its patches or rebuild the IR bitcode with the patch applied. Second, and more generally, Gentoo is famous for the ease with which users can customize their install. That includes applying patches at build time. I carry patches for things I like, but that I expect upstream would not be interested in accepting (or has already rejected). I am not alone in this, and suspect I am actually a rather mild user of this feature.

Proprietary software on Linux generally does the absolute minimum necessary to get by. If they wanted to integrate well, they would release the source and let the distributions sort it out. Therefore, I strongly doubt any proprietary package would lift a finger to implement your proposal. Since they refuse to release their source, no one else can bear that burden for them, so they will not change.

I like having source available, and having some confidence that the source I see is the program I built. Distributing as bitcode breaks that, in part because there is now a nagging question of whether the person who built the bitcode did it exactly correctly, and did not accidentally post bitcode generated from a different version, or with a patch missing.
Back to top
View user's profile Send private message
leyvi
Tux's lil' helper
Tux's lil' helper


Joined: 08 Sep 2023
Posts: 104

PostPosted: Sat Sep 30, 2023 10:40 pm    Post subject: Reply with quote

NeddySeagoon wrote:
leyvi,
Trying to distribute precompiled binaries, in any form, misses the point of Gentoo. Gentoo is not a distro. It's a toolkit that you use to design and build your very own distro.
You make leyvi LInux :) I have NeddySeagoon Linux and so on. We all use the same tools, we do not have the same installs.
Gentoo is the portage package manager and the ::gentoo ebuild repo. That's all. Everything else is {$UPSTREAM}

To take one example of the design aspect. app-office/libreoffice is affected by USE="accessibility base bluetooth +branding clang coinmp +cups custom-cflags +dbus debug eds firebird googledrive gstreamer +gtk java kde ldap +mariadb odk pdfimport postgres test valgrind vulkan" That's 25 USE flags. If they are all independent that's 2^25 different ways to build that one package.
When you use prebuilt binaries, you don't choose the USE flags settings

True, but I don't mean that programs should be totally built, more something like this: clang -c -o foo.ll --emit-llvm foo.c
all of the resulting .ll files could be tarballed and the rest of the build process could be completed on the user's machine.
USE flags change which source code is compiled into the final program, no? In that case, it would be the same, just with
IR, not source code.
Back to top
View user's profile Send private message
leyvi
Tux's lil' helper
Tux's lil' helper


Joined: 08 Sep 2023
Posts: 104

PostPosted: Sat Sep 30, 2023 10:47 pm    Post subject: Reply with quote

Hu wrote:
First, I think you need to quantify what is slow. Are you complaining about (1) the time Portage requires to perform dependency analysis, (2) the time packages take to build once Portage starts the build, (3) the time packages take to install after building completes, (4) other?

For (1), your proposal does nothing to help with this, because Portage still needs to solve the dependency chain before it knows what packages to build.

For (2), distributing packages in a ready-to-build configuration could help. Will your proposal actually make it meaningfully faster? What benchmarks have you run to validate this?

For (3), like (1), your proposal does not help.

For (4), please describe what you think is slow.

With that out of the way, I think this will not work. Packages are distributed in the form upstream prefers, except when that form is so much trouble to manage that the Gentoo developers repackage the archive and point the ebuild to that. So in the general case, implementing this requires convincing the relevant upstreams to distribute the required IR bitcode, or asking the Gentoo developers to repackage every relevant release as IR bitcode.

Next, most build systems are strongly oriented towards compiling source code into object code. If you want to distribute an input that is not source code, you need to adjust the build system to handle that. This is potentially a per-package task, which means it is a huge amount of work. Worse, if not merged upstream and maintained there, it could break with every release that adds new source files or changes how existing ones are built.

leyvi wrote:
The IR can still be optimized on the end-user's machine, USE flags can still do their thing
This is not necessarily true. Sometimes USE flags affect which parts of the program are passed to the compiler, such as by excluding code in a #ifdef block. Can IR bitcode represent arbitrary preprocessor guards?
leyvi wrote:
The only thing that wouldn't be possible is editing the source code before it gets built and installed, but who
does that anyway? Regardless, one could still do that if needed (its open-source, after all).
This is very common. At a mininum, every ebuild that applies any source code patch to the upstream archive qualifies for this, unless you posit that every such ebuild should instead abandon its patches or rebuild the IR bitcode with the patch applied. Second, and more generally, Gentoo is famous for the ease with which users can customize their install. That includes applying patches at build time. I carry patches for things I like, but that I expect upstream would not be interested in accepting (or has already rejected). I am not alone in this, and suspect I am actually a rather mild user of this feature.

Proprietary software on Linux generally does the absolute minimum necessary to get by. If they wanted to integrate well, they would release the source and let the distributions sort it out. Therefore, I strongly doubt any proprietary package would lift a finger to implement your proposal. Since they refuse to release their source, no one else can bear that burden for them, so they will not change.

I like having source available, and having some confidence that the source I see is the program I built. Distributing as bitcode breaks that, in part because there is now a nagging question of whether the person who built the bitcode did it exactly correctly, and did not accidentally post bitcode generated from a different version, or with a patch missing.

I feel like maybe I'm not being understood. I'm not suggesting that the whole of Gentoo should be changed, I just want feedback on how practical my idea is, if anyone would actually want to use it, and
if anyone would be interested in at least helping me with a proof-of-concept. All it would take to test would be a little bit of tinkering with the build scripts for a locally downloaded (but unbuilt) package. I just don't know if I have the expertise to do this single-handedly, nor do I even know if this is an idea worth pursuing.

I meant to say that portage is slow during the download an build/compile phase.

Also, notice that I put proprietary software as IR under "Ideas I'm less sure about".

You bring up an interesting point; preprocessing could present an issue, though there's probably some sort of workaround.
Keep in mind that this is just an idea,

and please, chill out. Your post seems to be very angry.
Back to top
View user's profile Send private message
leyvi
Tux's lil' helper
Tux's lil' helper


Joined: 08 Sep 2023
Posts: 104

PostPosted: Sat Sep 30, 2023 11:10 pm    Post subject: Reply with quote

NeddySeagoon wrote:
leyvi,
Portage is slow as its certainly solving a NP-hard problem and maybe an NP-complete problem, see https://en.wikipedia.org/wiki/NP-hardness.
That means that in theory, portage has to compute all possible solutions to find the best one. The classic example is the travelling salesman problem. I'll let you search for that.

What I can understand through the technical jargon here is that dependency resolution is time-consuming an computationally expensive. I am not proposing that be messed with.
This is a small side project that I might do, I just want to see if there's any point in even testing out my idea. I don't have a ton of spare time, so if my idea is pointless,
I want to know so that I don't waste my time on it. You people know a lot more about this than I do.
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Sun Oct 01, 2023 10:42 am    Post subject: Reply with quote

Ok, so distributing binary packages is a good idea but it doesn't fit Gentoo.
In fact, this idea is so good most linux distributions do that as the primary means of distributing software from their repositories.
Gentoo is an outlier made for different purposes. Nothing is free; in case of Gentoo you have more flexibility, and the price of said flexibility has been dumped into build times. Yes, it sucks, deal with it.

Quote:
What if packages were distributed as LLVM IR/bitcode?
I think not all packages work with llvm yet, that's one thing.
Second I have an impression that this half-baked approach combines disadvantages of both worlds: it still has to be compiled before use, which takes time and requires toolchain on the target system, just like building form sources. It also requires additional infrastructure and manpower for stage 1 compilation, packaging and hosting, like any binary repository. And yes, users' patches are a thing. Even I have some, even though I'm not much of a programmer.
On top of that, can you run both stages of compilation with different versions of LLVM, or would it crash and burn, and possibly eat your cat? Compilers' compatibility is a completely new problem to at least look out for.

So, what is the real benefit of distributing IR instead of sources?
You mentioned shorter installation times. Ok, that's a fair point, with half of the work already done, it should be faster. It's not going to be as fast as extracting a deb or rpm though, so am I going to notice a difference between a full build a and bottom-half build?
I treat compilation like batch jobs: launch, go about my business elsewhere, collect the results at my convenience. I don't just sit there patiently waiting for new stuff to be installed. I don't care if it takes 5 minutes of 15 minutes if I'm not coming back until half an hour later anyway.
Yes, this habit has been forced on me by long compilation times, and of course it would be better if I could get results immediately. 5 minutes is still to long for me to just wait patiently though.
Back to top
View user's profile Send private message
Goverp
Advocate
Advocate


Joined: 07 Mar 2007
Posts: 2194

PostPosted: Sun Oct 01, 2023 11:30 am    Post subject: Reply with quote

In some respects this suggestion is to do what Java does, and ship everything as byte codes for some virtual machine (in this case, one that compiles). Advantages: consistent packaging;probably smaller files; less work on the target machine. Drawbacks: many packages are written in languages not supported by LLVM - Java, python, ruby, haskell, etc; AFAIK the conversion to IR takes account of USE flag settings, so the choice is now fixed on the target machine, or you make 2**n versions of the package, one for each combination of n USE flags; portage itself, rather than the C compiler, still has the same amount of work (as above); IIUC quite a bit of work, often single-threaded, is in the "make" stage; the final compilation from IR to machine code, and especially optimization, remains quite expensive - conversion from C to IR is usually quite cheap.

The net is the savings will be a lot less than it first appears. It's certainly a workable approach - as mentioned, it's the Java infrastructure - but it's not Gentoo. You could probably call it GenThree if you wanted :-). Gentoo is source; if it's not source, it's not Gentoo, so the approach cannot make Gentoo faster.

Gentoo has no built-in limitation that stops someone building packages that ship as IR or binary (indeed, there are many foo-bin packages, such as firefox-bin, rust-bin - 103 in total) and an ebuild that compiles to machine code and installs. It would be comparatively easy to set up a repository and populate it. I'm not sure how much use it would get though - it depends on whether the chosen USE flag settings are sufficiently general, and whether users would put up with the bloat - for example, support for both Gnome and KDE in all packages.
_________________
Greybeard
Back to top
View user's profile Send private message
Zucca
Moderator
Moderator


Joined: 14 Jun 2007
Posts: 3847
Location: Rasi, Finland

PostPosted: Sun Oct 01, 2023 12:37 pm    Post subject: Reply with quote

Goverp wrote:
AFAIK the conversion to IR takes account of USE flag settings, so the choice is now fixed on the target machine
I thought this too.
But the exception are those USE-flags that don't affect the source to be compiled. For example 'doc' generally only affect at the src_install() phase.
_________________
..: Zucca :..

My gentoo installs:
init=/sbin/openrc-init
-systemd -logind -elogind seatd

Quote:
I am NaN! I am a man!
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54681
Location: 56N 3W

PostPosted: Sun Oct 01, 2023 12:45 pm    Post subject: Reply with quote

Zucca,

USE=doc affects what is built too. Being somewhat err excentric :) I have USE=doc in make.conf in by Raspberry Pi 4 build host,
That finds build time bugs. Then I have a list of exceptions in package.use/nodocs to get things to build.

FEATURES="nodocs" affects the install time. So I don't install docs to the the Pi. That throws them away at install time.
There is also FEATURES="noinfo noman" if you want to be lean and mean, even if there is not separate USE flags to avoid the build.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
sam_
Developer
Developer


Joined: 14 Aug 2020
Posts: 2076

PostPosted: Sun Oct 01, 2023 2:14 pm    Post subject: Reply with quote

Zucca wrote:
Goverp wrote:
AFAIK the conversion to IR takes account of USE flag settings, so the choice is now fixed on the target machine
I thought this too.
But the exception are those USE-flags that don't affect the source to be compiled. For example 'doc' generally only affect at the src_install() phase.


It's fair to say this is the exception and if this were the only USE flag type Gentoo offered, we'd be upsetting rather a lot of our users.

Anyway, Hu's questions are rather pertinent here and get to the heart of the problem with this proposal. I don't have anything else to add.
Back to top
View user's profile Send private message
ChrisJumper
Advocate
Advocate


Joined: 12 Mar 2005
Posts: 2400
Location: Germany

PostPosted: Tue Oct 03, 2023 9:34 pm    Post subject: Reply with quote

Hi leyvi,

the reason why i use gentoo is to have in some case, not 99%. The possibility to cook the software by myself. So i have not to trust this "i already have prepared a binary" Package or Finished cooked or baked food. Gentoo is about sharing some kind of knowledge about software. You are right - hardware optimization could do wtf optimization on the software layer too. But we have to trust someone else on a level.

You can try to distribute your compiled binary Packages for your domain of computers with same hardware.. but it is every time a solution aimed between pros and cons of self compilation.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum