Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
How do I download all packages and cache them on my LAN?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
jgaz
n00b
n00b


Joined: 14 Feb 2021
Posts: 48

PostPosted: Fri Oct 14, 2022 4:39 pm    Post subject: How do I download all packages and cache them on my LAN? Reply with quote

Is it possible to tell portage to download all packages for all repositories and cache them locally on my LAN? I know my question sounds a little nuts, but bear with me.

The are a few reasons for this question:

1. It seems like it would be an effective way to audit all Gentoo and/or GURU ebuilds for stale SRC_URI entries.
2. As a bonus, once I have all of the packages I can check for bad hashes too.
3. I assume there is a way to tell a binhost (which I intend to setup) to use the local cache of packages in lieu of the SRC_URI.

I appreciate how much bandwidth and disk space #1 will consume. Checking 19,500 hashes is likely to take a loooong time too, no matter how many cores I have. I assume #3 is possible, but I have no idea how to set it up. For #1 I also assume there is a way to tell portage to skip a on download failure but log the issue and continue to the next package. I'm not sure how to set that up either.

So, is this doable? Has anyone else done this?
Back to top
View user's profile Send private message
alamahant
Advocate
Advocate


Joined: 23 Mar 2019
Posts: 3935

PostPosted: Fri Oct 14, 2022 5:14 pm    Post subject: Reply with quote

All ebuilds are updated in each and every eix-sync
If you are talking about distfiles then
Code:

eix-sync
for i in $(EIX_LIMIT=0 eix --only-names);do emerge -fv $i;done

should do it.
If you need overlays also you have to add them yourself.
I think its an exercize in vanity though.
_________________
:)


Last edited by alamahant on Fri Oct 14, 2022 5:30 pm; edited 1 time in total
Back to top
View user's profile Send private message
jgaz
n00b
n00b


Joined: 14 Feb 2021
Posts: 48

PostPosted: Fri Oct 14, 2022 5:18 pm    Post subject: Reply with quote

Yes, I was talking about the distfiles. Thanks for the info!
Back to top
View user's profile Send private message
mike155
Advocate
Advocate


Joined: 17 Sep 2010
Posts: 4438
Location: Frankfurt, Germany

PostPosted: Fri Oct 14, 2022 7:20 pm    Post subject: Reply with quote

It will be faster if you add emerge option '--nodeps'. :)
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54667
Location: 56N 3W

PostPosted: Fri Oct 14, 2022 7:41 pm    Post subject: Reply with quote

jgaz,

Some packages are fetch restricted.
You will need to fetch them yourself.

Use mirrorselect to find a fast mirror near you than wget the bit you need. There is no need to use emerge as you want a complete distfiles mirror.
Be warned that last time I looked, it was over 250G, not including fetch restricted files.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jgaz
n00b
n00b


Joined: 14 Feb 2021
Posts: 48

PostPosted: Fri Oct 14, 2022 7:49 pm    Post subject: Reply with quote

NeddySeagoon wrote:
jgaz,

Some packages are fetch restricted.
You will need to fetch them yourself.

Use mirrorselect to find a fast mirror near you than wget the bit you need. There is no need to use emerge as you want a complete distfiles mirror.
Be warned that last time I looked, it was over 250G, not including fetch restricted files.


Okay, good to know. When was this? Also, why are some files fetch restricted?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54667
Location: 56N 3W

PostPosted: Fri Oct 14, 2022 8:05 pm    Post subject: Reply with quote

jgaz,

Fetch restrictions apply to files that Gentoo is not permitted to mirror, for whatever reason.
Its not just Gentoo that is not permitted to mirror these files.
They may have a EULA or a click through licence and so on ...

The 250G was a few years ago now. Also its a moving target. The gentoo master mirror follows the ::gentoo repo, which is updated every 30 min.
The master mirror is only available to other mirrors. Its up to the mirror admins how often they update but it isn't every 30min.

If you really want to fetch from the SRC_URI, which is considered bad netiquette as it defeats the purpose of having a mirror system, use emerge and set GENTOO_MIRRORS="", so that the mirror system is not used.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
jgaz
n00b
n00b


Joined: 14 Feb 2021
Posts: 48

PostPosted: Fri Oct 14, 2022 8:15 pm    Post subject: Reply with quote

So Gentoo Mirrors contain not just the ebuilds (which I knew) but most distfiles too? If so, than bad URI links are probably known -- and likely automatically bug reported on -- and there is no need for this kind of exercise.
Back to top
View user's profile Send private message
grknight
Retired Dev
Retired Dev


Joined: 20 Feb 2015
Posts: 1969

PostPosted: Fri Oct 14, 2022 8:26 pm    Post subject: Reply with quote

jgaz wrote:
So Gentoo Mirrors contain not just the ebuilds (which I knew) but most distfiles too? If so, than bad URI links are probably known -- and likely automatically bug reported on -- and there is no need for this kind of exercise.


One of the main advantages of mirroring distfiles is to avoid bad URI links and have a consistent experience even if the upstream goes away (permanently or not).

We do not recommend people unset GENTOO_MIRRORS because of the possibility of missing/changed files upstream.

If there is a true need for a bug of an updated SRC_URI, then it could be filed. Just having it missing is not the highest concern if it exists on the mirrors.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54667
Location: 56N 3W

PostPosted: Fri Oct 14, 2022 8:35 pm    Post subject: Reply with quote

jgaz,

The mirrors for the ::gentoo repo and the distfiles are separated.
At the time an ebuild is added to the tree, its SRC_URI files are added to the mirror system too.
ebuilds can be in the tree for years and SRC_URI can change or like google did a few years ago, go away.

To my knowledge, there is no check of SRC_URIs to make sure that they are still alive.
Once the required files are on the mirrors they won't be removed until all the ebuilds that need them are gone too.

It gets interesting when you want the files that an old ebuiild needs and SRC_URI no longer works.

Set out your idea in a bug for gentoo-infra. They will be able to tell you more.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
pingtoo
Veteran
Veteran


Joined: 10 Sep 2021
Posts: 1359
Location: Richmond Hill, Canada

PostPosted: Sat Oct 15, 2022 10:38 am    Post subject: Re: How do I download all packages and cache them on my LAN? Reply with quote

jgaz.

jgaz wrote:
Is it possible to tell portage to download all packages for all repositories and cache them locally on my LAN? I know my question sounds a little nuts, but bear with me.

The are a few reasons for this question:

1. It seems like it would be an effective way to audit all Gentoo and/or GURU ebuilds for stale SRC_URI entries.
2. As a bonus, once I have all of the packages I can check for bad hashes too.
3. I assume there is a way to tell a binhost (which I intend to setup) to use the local cache of packages in lieu of the SRC_URI.

I appreciate how much bandwidth and disk space #1 will consume. Checking 19,500 hashes is likely to take a loooong time too, no matter how many cores I have. I assume #3 is possible, but I have no idea how to set it up. For #1 I also assume there is a way to tell portage to skip a on download failure but log the issue and continue to the next package. I'm not sure how to set that up either.

So, is this doable? Has anyone else done this?


Many had give good ideas about your main question, however I notice from your point 3, you may have misunderstood the Binhost concept in Gentoo. At lease it is not direct associated with "local cache of packages in lieu of SRC_URI"

Gentoo Portage, will always use DISTDIR environment variable first for package's source archive file for building the package, Only when the package's source archive file NOT in $DISTDIR location than a fetch action will perform to get it from $SRC_URI. Whereas the *binhost* is a distribution point, It store your build results in binary format for later reuse or to distribute to other Gentoo system in your environment.

Please see Gentoo Binary package guide for more detail.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54667
Location: 56N 3W

PostPosted: Sat Oct 15, 2022 10:57 am    Post subject: Reply with quote

jgaz,

Picking up on pingtoo's point with some examples. A BINHOST is a tree of ready to install all binary packages.
The binhost in the link is built on a 32 core ARM 64 CPU but installs and runs on smaller systems, it this case 64 bit Raspberry Pi.
That's ARM64 only as its all binary files.

A dist files mirror is like that. That particular distfiles mirror is all the sources I have ever downloaded since the middle of 2006.
There is a smattering of earlier stuff too.
It stops at about Sep-21 as I have not uploaded them yet. It's a poor wee VM and there are a lot of files there, so it takes a while to autoindex.
It's fairly well know in the Gentoo community as a source of distfiles for updating very old installs.
Its all source files.

I don't need to host the ::gentoo repo as its all in git, back to the beginning of Gentoo adopting CVS in around 2002, so its possible to make your own snapshot of the repo at any commit you choose to.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Genone
Retired Dev
Retired Dev


Joined: 14 Mar 2003
Posts: 9620
Location: beyond the rim

PostPosted: Wed Oct 19, 2022 10:25 am    Post subject: Reply with quote

basically you want to setup a local distfiles mirror, so you might be interested in this: https://wiki.gentoo.org/wiki/Project:Infrastructure/Mirrors/Source

NeddySeagoon wrote:
jgaz,

Fetch restrictions apply to files that Gentoo is not permitted to mirror, for whatever reason.
Its not just Gentoo that is not permitted to mirror these files.

jgaz wrote:
Okay, good to know. When was this? Also, why are some files fetch restricted?

Fetch-restricted files simply cannot be downloaded automatically, e.g. because the download URL is generated on-demand after submitting a registration form or some other protection mechanism. That is however a relatively rare situation. Files that simply cannot be redistributed legally are "only" mirror-restricted, which is usually of no concern to users (depending on who will have access to your mirror you may have to account for that from a strictly legal perspective).
Quote:
I appreciate how much bandwidth and disk space #1 will consume. Checking 19,500 hashes is likely to take a loooong time too, no matter how many cores I have.

The hash calculation is not the problem. But to calculate the hash you have to read each file completely, so for this task you'd likely be IO-bound, not CPU-bound (depending on the storage media of course). Of course once the data is in memory you can feed it to multiple hash functions in parallel, but that's going to require some programming if you don't want to rely on filesystem caching.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum