Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Shrinking Portage
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2152

PostPosted: Tue Aug 29, 2006 6:09 am    Post subject: Shrinking Portage Reply with quote

First of all, I didn't find an existing topic for this. If there's one, please merge or delete this one, thx!

So today I did my regular system-update via 'emerge --sync && update-eix && emerge -atuvD world' and watched the files being checked for updates when it (again) came to my mind "why the heck is portage that big?". I remebered that I once stumbled across the exclude feature of rsync(and with that also portage) and so read up about how to do it.
I did an 'du -hs /usr/portage' and it returned ~620mb(not including distfiles and packages!). For me, portage isn't readable via the filesystem and I'm not really a fan of GUIs for such tasks, so I browsed portage on http://gentoo-portage.com to see, what's really in portage which I need. It's been frustrating I use only <= 5% of what's in portage, but I download the ebuilds/patches/digests/etc for everything in portage, which of course isn't good for me nor for myriad of mirrors out there.

So far my system is running well and has everything I need, so there's no real reason to keep ebuilds on my box/sync them, if I would never use it. So I quickly hacked up a little script to generate me an exclude-file based on what I've installed on my system, let it run, rm'd portage an sync'ed again. Well, after again doing an 'du -hs /usr/portage' I was quite buffled, seeing that it has only 117mb left by now and still includes all I need. Since I'm using a compress portage as described in this thread my squashed-down portage has less than 10mb in size and is fast enough for me.

So what's this thread all about? Well, why do I have to do the work? My script's far from perfect, but at least it works - thinking about doing this task per hand makes me really shudder, 'cause the file now has nearly 1k of lines(could be made more compact, but I prefer it being readle just in case).
So I think, the benefits are clear. I/the user saves bandwidth and diskspace and on top I get a faster portage(syncing/searching) and the mirrors would save a whole lot of bandwidth, if anybody would make an exclude-file.

So the question is, wouldn't it be better/easier to have an portage_include instead of portage_exclude? With it, the user would only net to answer a set of questions and get a portage as needed.

To make it clear what I was thinking about, check this little questions, one could be asked:

1) Desktop or Server?
In case of desktop, categories like www-servers, www-apps, sys-cluster, net-www, net-zope are unneeded.

2) Architecure?
Let it be x86/x_64, so you could drop anything related to for example ppc,sparc,etc.

3) Desktop or Notebook?
If desktop, most things related to acpi, powersaving, wireless, pcmcia, etc could be dropped.

4) X or not?
If not, any X-program could be dropped

5) If X, Gnome, KDE, XFCE, other?
Drop all which are not $answer, if $answer = other, then ask for the prefered toolkit and drop the others.

6) Gaming?
Drop all games-* if not

There could be a lot more of this easy questions, but I think you get the idea.

So, what do you think?

cheers,
ph
Back to top
View user's profile Send private message
anello
Guru
Guru


Joined: 17 Jul 2005
Posts: 557
Location: EU -> DE -> Stuttgart

PostPosted: Tue Aug 29, 2006 6:29 am    Post subject: Reply with quote

Gentoo is all about choices, but these choices aren't supposed to limit.

Your questions are limiting the user. For example, what if I want to have a desktop, but also want to have trac since I'm a developer ...

There are always exceptions and the beautiful thing with gentoo is that I can configure the system just as I need it to be.
_________________
Antonino Catinello | http://catinello.eu
Back to top
View user's profile Send private message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2152

PostPosted: Tue Aug 29, 2006 6:40 am    Post subject: Reply with quote

I see your point since I'm also running an apache for testing on my desktop-machine, but still I find it easier to include packages I really want instead of throwing out everything I won't need. A little grep/sed-magic would make it easy to include packages and there dependencies.

Sure, Gentoo is about choice and I don't want to force anybody to use an exclude/include-file, but it would be good if it would be used.
Back to top
View user's profile Send private message
think4urs11
Bodhisattva
Bodhisattva


Joined: 25 Jun 2003
Posts: 6659
Location: above the cloud

PostPosted: Tue Aug 29, 2006 6:52 am    Post subject: Reply with quote

ph030 wrote:
Sure, Gentoo is about choice and I don't want to force anybody to use an exclude/include-file, but it would be good if it would be used.

Well the 'no-force'-option is already there with rsync_excludes as you already stated.
And it is much easier to exclude some parts than to explicity include everything a user wants (think about packet dependencies).

Who would decide what is in scope for a desktop system? I for my part like to have acpi and wireless stuff in as some have desktops which are connected with wireless cards due to cabling issues. How to treat programs with optional X support like nmap and mc?
BTW: excluding net-www for desktops would also mean to drop netscape-flash which is clearly desktop oriented software
_________________
Nothing is secure / Security is always a trade-off with usability / Do not assume anything / Trust no-one, nothing / Paranoia is your friend / Think for yourself
Back to top
View user's profile Send private message
Q-collective
Advocate
Advocate


Joined: 22 Mar 2004
Posts: 2071

PostPosted: Tue Aug 29, 2006 7:05 am    Post subject: Reply with quote

Portage has a number of issues but you name two very important ones:
1. its use of rsync
2. it's huge

Problem 1 can be devided into several other problems: it's slow and the tree is always current or out of date which can create problems if users depend on specific versions of ebuilds or something screwed their system and they don't exactly know what it was. You can solve this by using subversion instead of rsync which is much faster and has the ability to pull in older versions of portage if needed. The common argument against subversion is that it would put a higher load on the mirrors. I personally think that this is not really an issue since we're talking about the same volume of data.

Problem 2 is trickier. Portage consists of about 150 000 files that are all a few kilobyte big, this does create a huge tree of several hundreds of MB. You already mentioned the answer: using squashfs by default would immensely reduce the tree. This does create a little slower tree, but all in all squashfs is well known for its good performance.

I think this would be two very good, yet relatively simple, improvements to portage.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20067

PostPosted: Tue Aug 29, 2006 4:02 pm    Post subject: Reply with quote

This has been discussed quite a bit. Here's one issue:
emerge sync is too long wrote:
ciaranm wrote:
itsr0y wrote:
What I don't understand is, why do we need to download every single ebuild that ever existed? Why, when sync'ing, can't we just download just a list of available packages and versions. Then, when you want to emerge a piece of software, it downloads the ebuild for it. You have to download the tarball anyway, why not add one extra download - it should only take an extra second to do that, but would save a TON of bandwidth and space for syncing. For most cases, all you really need to know is what packages and versions are available.

Because in order to determine whether <foo> can be emerged, what version to go for and what deps are needed, you need most of the tree anyway...


ciaranm wrote:
I suggest that anyone who thinks they have a lower load way of doing this a) looks at the existing metadata cache and b) calculates exactly how much of the tree is actually required to emerge any given package (hint: you need to consider system and deps too). 'Cos, ya know, all these nice clever ideas suddenly don't seem so good when you sit down and try to implement it 8O"

That is from 2004, but I don't know that it is outdated.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9538
Location: beyond the rim

PostPosted: Wed Aug 30, 2006 6:26 pm    Post subject: Reply with quote

Q-collective wrote:
Portage has a number of issues but you name two very important ones:
1. its use of rsync
2. it's huge

Problem 1 can be devided into several other problems: it's slow and the tree is always current or out of date which can create problems if users depend on specific versions of ebuilds or something screwed their system and they don't exactly know what it was. You can solve this by using subversion instead of rsync which is much faster and has the ability to pull in older versions of portage if needed. The common argument against subversion is that it would put a higher load on the mirrors. I personally think that this is not really an issue since we're talking about the same volume of data.

You have some numbers to back this up?

Quote:
Problem 2 is trickier. Portage consists of about 150 000 files that are all a few kilobyte big, this does create a huge tree of several hundreds of MB. You already mentioned the answer: using squashfs by default would immensely reduce the tree. This does create a little slower tree, but all in all squashfs is well known for its good performance.

Performance isn't the problem with squashfs. It being read-only is the problem. Now we could try to offer squashfs images of the tree, but then each sync would have to fetch the whole 30MB image as deltas of compressed files are generally useless (unless someone wrote a working squashfs specific delta generator which I haven't heard of).
Btw, if you're concerned about number of files, then you really don't want to use subversion as it would increase them by factor 2 at least.
Back to top
View user's profile Send private message
Q-collective
Advocate
Advocate


Joined: 22 Mar 2004
Posts: 2071

PostPosted: Wed Aug 30, 2006 7:01 pm    Post subject: Reply with quote

Genone wrote:
Q-collective wrote:
Portage has a number of issues but you name two very important ones:
1. its use of rsync
2. it's huge

Problem 1 can be devided into several other problems: it's slow and the tree is always current or out of date which can create problems if users depend on specific versions of ebuilds or something screwed their system and they don't exactly know what it was. You can solve this by using subversion instead of rsync which is much faster and has the ability to pull in older versions of portage if needed. The common argument against subversion is that it would put a higher load on the mirrors. I personally think that this is not really an issue since we're talking about the same volume of data.

You have some numbers to back this up?

No, it was common sense that I used, please correct me if I'm terrible wrong.

Genone wrote:
Quote:
Problem 2 is trickier. Portage consists of about 150 000 files that are all a few kilobyte big, this does create a huge tree of several hundreds of MB. You already mentioned the answer: using squashfs by default would immensely reduce the tree. This does create a little slower tree, but all in all squashfs is well known for its good performance.

Performance isn't the problem with squashfs. It being read-only is the problem. Now we could try to offer squashfs images of the tree, but then each sync would have to fetch the whole 30MB image as deltas of compressed files are generally useless (unless someone wrote a working squashfs specific delta generator which I haven't heard of).
Btw, if you're concerned about number of files, then you really don't want to use subversion as it would increase them by factor 2 at least.

Increase the number? How so? Besides, the number doesn't bother me, the size is. And I wasn't aware that squashfs is read-only, isn't there an alternative that offers a good compressable filesystem?
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9538
Location: beyond the rim

PostPosted: Wed Aug 30, 2006 8:07 pm    Post subject: Reply with quote

Q-collective wrote:
Genone wrote:
You have some numbers to back this up?

No, it was common sense that I used, please correct me if I'm terrible wrong.

Not saying that you're wrong, just curious how you came to that conclusion that subversion is faster.

Q-collective wrote:
Genone wrote:
Btw, if you're concerned about number of files, then you really don't want to use subversion as it would increase them by factor 2 at least.

Increase the number? How so? Besides, the number doesn't bother me, the size is. And I wasn't aware that squashfs is read-only, isn't there an alternative that offers a good compressable filesystem?

Because subversion keeps copies for each file (in .svn/text-base) so it can generate diffs without going online. Maybe there is a way to turn that feature off (haven't checked), but by default each file would exist twice (so also twice the storage requirements). Oh, and number of files is actually the main space killer due to filesystem overhead in case you didn't know.
As for squashfs alternatives, I don't know of a stable, compressed read-write filesystem.


Last edited by Genone on Thu Aug 31, 2006 4:05 am; edited 2 times in total
Back to top
View user's profile Send private message
ciaranm
Retired Dev
Retired Dev


Joined: 19 Jul 2003
Posts: 1719
Location: In Hiding

PostPosted: Wed Aug 30, 2006 8:19 pm    Post subject: Reply with quote

Please stop confusing 'the tree' and 'Portage'. It's very annoying.
Back to top
View user's profile Send private message
Hauser
l33t
l33t


Joined: 27 Dec 2003
Posts: 650
Location: 4-dimensional hyperplane

PostPosted: Wed Aug 30, 2006 8:58 pm    Post subject: Re: Shrinking Portage Reply with quote

ph030 wrote:
...
I did an 'du -hs /usr/portage' and it returned ~620mb(not including distfiles and packages!)...

Code:
$ du -hs /usr/portage/
188M    /usr/portage/

:wink:
btw I put the distfiles elsewhere.
_________________
AMD Athlon XP 2600+; 512M RAM;
nVidia FX5700LE; Hitachi 120Gb
2.6.9-nitro4, reiser4, linux26-headers+nptl

Do I like to compile everything?
Positive definite!
Back to top
View user's profile Send private message
Archangel1
Veteran
Veteran


Joined: 21 Apr 2004
Posts: 1212
Location: Work

PostPosted: Wed Aug 30, 2006 10:19 pm    Post subject: Re: Shrinking Portage Reply with quote

I'm not sure it'd work in practice particularly well. I could disagree with a couple of the examples you gave,
ph030 wrote:
2) Architecure?
Let it be x86/x_64, so you could drop anything related to for example ppc,sparc,etc.

Are there all that many ebuilds related specifically to ppc or sparc? I would have guessed not.

ph030 wrote:
3) Desktop or Notebook?
If desktop, most things related to acpi, powersaving, wireless, pcmcia, etc could be dropped.

Disagree. Most new desktops now have CPU frequency scaling, and a wireless card in one isn't completely unreasonable.

But that's a bit specific - I think the real issue is bigger. Any package can specify dependencies on any other package - so you answer "yes" to KDE and "no" to GNOME, but what about some random app in net-im that happens to depend on, say, gconf? You'd have to either go trawling through your includes/excludes to see where the problem was, or (more likely) just sync the whole tree at that point.
Seems like an easy way to fill up Bugzilla though.
_________________
What are you, stupid?
Back to top
View user's profile Send private message
boniek
Guru
Guru


Joined: 26 Mar 2005
Posts: 373

PostPosted: Wed Aug 30, 2006 10:34 pm    Post subject: Reply with quote

You can always put portage tree in a sparse file, formatted with filesystem using small block size to make it somewhat smaller.
Back to top
View user's profile Send private message
avx
Advocate
Advocate


Joined: 21 Jun 2004
Posts: 2152

PostPosted: Thu Aug 31, 2006 4:12 pm    Post subject: Reply with quote

For my examples, they are *just* examples - haven't thought about them too much ;)

For the squashfs-part...have you read the whole thread I linked in the first post? It's not only about squashfs, but also about unionfs to make squashfs writable in RAM and write it the changes back squashed directly, which works pretty good.

My 'tree' is now only 117mb unsquashed and *less* then 10mb squashed, not noticibly (do I write it like this?) slower and it wasn't too much work. As I said, my systems are build up finished I have what I want, nothing more or less, so it's pretty easy to find out what parts of the tree I don't need. If I really need or want to try out a new application I hear/read about it somewhere on the net so it ain't a big problem to open up gentoo-portage.com and/or b.g.o./$overlay and check if there are already ebuilds available. If there's no ebuild the I may write one myself(seeing the deps on the authors page) or build it by hand - in both cases I need to know if/what dependencies I need and therefor it's pretty easy to include them in my list. Right now I'm working on a script to add (recursive) deps to the list and it works so far. I'm no script-guru at all, but if I can do it one of our beloved devs can do it, too.

To make it clear once again...I *don't* want to force anybody to do it and I know (by my own experience) that it can be a little difficult to setup, but also I see the benefits for my self and also for Gentoo as a whole - after all, it's one easy way to give something back to the community AND make my own system better.

So please, if you're only hanging up on my examples, think about some other examples, there's definetly no need to have the whole tree at home if:

a) you know what you want
b) your system already has everything you want/need
c) are willing to spend some time

cheers,
ph
Back to top
View user's profile Send private message
AllenJB
Veteran
Veteran


Joined: 02 Sep 2005
Posts: 1285

PostPosted: Fri Sep 01, 2006 10:56 am    Post subject: Reply with quote

You can exclude categories (and even individual packages) from portgae with rsync's exclude_from. See http://gentoo-wiki.com/TIP_Exclude_categories_from_emerge_sync
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum