View previous topic :: View next topic |
Author |
Message |
jgaz n00b
Joined: 14 Feb 2021 Posts: 48
|
Posted: Fri Oct 14, 2022 4:39 pm Post subject: How do I download all packages and cache them on my LAN? |
|
|
Is it possible to tell portage to download all packages for all repositories and cache them locally on my LAN? I know my question sounds a little nuts, but bear with me.
The are a few reasons for this question:
1. It seems like it would be an effective way to audit all Gentoo and/or GURU ebuilds for stale SRC_URI entries.
2. As a bonus, once I have all of the packages I can check for bad hashes too.
3. I assume there is a way to tell a binhost (which I intend to setup) to use the local cache of packages in lieu of the SRC_URI.
I appreciate how much bandwidth and disk space #1 will consume. Checking 19,500 hashes is likely to take a loooong time too, no matter how many cores I have. I assume #3 is possible, but I have no idea how to set it up. For #1 I also assume there is a way to tell portage to skip a on download failure but log the issue and continue to the next package. I'm not sure how to set that up either.
So, is this doable? Has anyone else done this? |
|
Back to top |
|
|
alamahant Advocate
Joined: 23 Mar 2019 Posts: 3948
|
Posted: Fri Oct 14, 2022 5:14 pm Post subject: |
|
|
All ebuilds are updated in each and every eix-sync
If you are talking about distfiles then
Code: |
eix-sync
for i in $(EIX_LIMIT=0 eix --only-names);do emerge -fv $i;done
|
should do it.
If you need overlays also you have to add them yourself.
I think its an exercize in vanity though. _________________
Last edited by alamahant on Fri Oct 14, 2022 5:30 pm; edited 1 time in total |
|
Back to top |
|
|
jgaz n00b
Joined: 14 Feb 2021 Posts: 48
|
Posted: Fri Oct 14, 2022 5:18 pm Post subject: |
|
|
Yes, I was talking about the distfiles. Thanks for the info! |
|
Back to top |
|
|
mike155 Advocate
Joined: 17 Sep 2010 Posts: 4438 Location: Frankfurt, Germany
|
Posted: Fri Oct 14, 2022 7:20 pm Post subject: |
|
|
It will be faster if you add emerge option '--nodeps'. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54744 Location: 56N 3W
|
Posted: Fri Oct 14, 2022 7:41 pm Post subject: |
|
|
jgaz,
Some packages are fetch restricted.
You will need to fetch them yourself.
Use mirrorselect to find a fast mirror near you than wget the bit you need. There is no need to use emerge as you want a complete distfiles mirror.
Be warned that last time I looked, it was over 250G, not including fetch restricted files. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
jgaz n00b
Joined: 14 Feb 2021 Posts: 48
|
Posted: Fri Oct 14, 2022 7:49 pm Post subject: |
|
|
NeddySeagoon wrote: | jgaz,
Some packages are fetch restricted.
You will need to fetch them yourself.
Use mirrorselect to find a fast mirror near you than wget the bit you need. There is no need to use emerge as you want a complete distfiles mirror.
Be warned that last time I looked, it was over 250G, not including fetch restricted files. |
Okay, good to know. When was this? Also, why are some files fetch restricted? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54744 Location: 56N 3W
|
Posted: Fri Oct 14, 2022 8:05 pm Post subject: |
|
|
jgaz,
Fetch restrictions apply to files that Gentoo is not permitted to mirror, for whatever reason.
Its not just Gentoo that is not permitted to mirror these files.
They may have a EULA or a click through licence and so on ...
The 250G was a few years ago now. Also its a moving target. The gentoo master mirror follows the ::gentoo repo, which is updated every 30 min.
The master mirror is only available to other mirrors. Its up to the mirror admins how often they update but it isn't every 30min.
If you really want to fetch from the SRC_URI, which is considered bad netiquette as it defeats the purpose of having a mirror system, use emerge and set GENTOO_MIRRORS="", so that the mirror system is not used. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
jgaz n00b
Joined: 14 Feb 2021 Posts: 48
|
Posted: Fri Oct 14, 2022 8:15 pm Post subject: |
|
|
So Gentoo Mirrors contain not just the ebuilds (which I knew) but most distfiles too? If so, than bad URI links are probably known -- and likely automatically bug reported on -- and there is no need for this kind of exercise. |
|
Back to top |
|
|
grknight Retired Dev
Joined: 20 Feb 2015 Posts: 1991
|
Posted: Fri Oct 14, 2022 8:26 pm Post subject: |
|
|
jgaz wrote: | So Gentoo Mirrors contain not just the ebuilds (which I knew) but most distfiles too? If so, than bad URI links are probably known -- and likely automatically bug reported on -- and there is no need for this kind of exercise. |
One of the main advantages of mirroring distfiles is to avoid bad URI links and have a consistent experience even if the upstream goes away (permanently or not).
We do not recommend people unset GENTOO_MIRRORS because of the possibility of missing/changed files upstream.
If there is a true need for a bug of an updated SRC_URI, then it could be filed. Just having it missing is not the highest concern if it exists on the mirrors. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54744 Location: 56N 3W
|
Posted: Fri Oct 14, 2022 8:35 pm Post subject: |
|
|
jgaz,
The mirrors for the ::gentoo repo and the distfiles are separated.
At the time an ebuild is added to the tree, its SRC_URI files are added to the mirror system too.
ebuilds can be in the tree for years and SRC_URI can change or like google did a few years ago, go away.
To my knowledge, there is no check of SRC_URIs to make sure that they are still alive.
Once the required files are on the mirrors they won't be removed until all the ebuilds that need them are gone too.
It gets interesting when you want the files that an old ebuiild needs and SRC_URI no longer works.
Set out your idea in a bug for gentoo-infra. They will be able to tell you more. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
pingtoo Veteran
Joined: 10 Sep 2021 Posts: 1420 Location: Richmond Hill, Canada
|
Posted: Sat Oct 15, 2022 10:38 am Post subject: Re: How do I download all packages and cache them on my LAN? |
|
|
jgaz.
jgaz wrote: | Is it possible to tell portage to download all packages for all repositories and cache them locally on my LAN? I know my question sounds a little nuts, but bear with me.
The are a few reasons for this question:
1. It seems like it would be an effective way to audit all Gentoo and/or GURU ebuilds for stale SRC_URI entries.
2. As a bonus, once I have all of the packages I can check for bad hashes too.
3. I assume there is a way to tell a binhost (which I intend to setup) to use the local cache of packages in lieu of the SRC_URI.
I appreciate how much bandwidth and disk space #1 will consume. Checking 19,500 hashes is likely to take a loooong time too, no matter how many cores I have. I assume #3 is possible, but I have no idea how to set it up. For #1 I also assume there is a way to tell portage to skip a on download failure but log the issue and continue to the next package. I'm not sure how to set that up either.
So, is this doable? Has anyone else done this? |
Many had give good ideas about your main question, however I notice from your point 3, you may have misunderstood the Binhost concept in Gentoo. At lease it is not direct associated with "local cache of packages in lieu of SRC_URI"
Gentoo Portage, will always use DISTDIR environment variable first for package's source archive file for building the package, Only when the package's source archive file NOT in $DISTDIR location than a fetch action will perform to get it from $SRC_URI. Whereas the *binhost* is a distribution point, It store your build results in binary format for later reuse or to distribute to other Gentoo system in your environment.
Please see Gentoo Binary package guide for more detail. |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54744 Location: 56N 3W
|
Posted: Sat Oct 15, 2022 10:57 am Post subject: |
|
|
jgaz,
Picking up on pingtoo's point with some examples. A BINHOST is a tree of ready to install all binary packages.
The binhost in the link is built on a 32 core ARM 64 CPU but installs and runs on smaller systems, it this case 64 bit Raspberry Pi.
That's ARM64 only as its all binary files.
A dist files mirror is like that. That particular distfiles mirror is all the sources I have ever downloaded since the middle of 2006.
There is a smattering of earlier stuff too.
It stops at about Sep-21 as I have not uploaded them yet. It's a poor wee VM and there are a lot of files there, so it takes a while to autoindex.
It's fairly well know in the Gentoo community as a source of distfiles for updating very old installs.
Its all source files.
I don't need to host the ::gentoo repo as its all in git, back to the beginning of Gentoo adopting CVS in around 2002, so its possible to make your own snapshot of the repo at any commit you choose to. _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
Genone Retired Dev
Joined: 14 Mar 2003 Posts: 9625 Location: beyond the rim
|
Posted: Wed Oct 19, 2022 10:25 am Post subject: |
|
|
basically you want to setup a local distfiles mirror, so you might be interested in this: https://wiki.gentoo.org/wiki/Project:Infrastructure/Mirrors/Source
NeddySeagoon wrote: | jgaz,
Fetch restrictions apply to files that Gentoo is not permitted to mirror, for whatever reason.
Its not just Gentoo that is not permitted to mirror these files. |
jgaz wrote: | Okay, good to know. When was this? Also, why are some files fetch restricted? |
Fetch-restricted files simply cannot be downloaded automatically, e.g. because the download URL is generated on-demand after submitting a registration form or some other protection mechanism. That is however a relatively rare situation. Files that simply cannot be redistributed legally are "only" mirror-restricted, which is usually of no concern to users (depending on who will have access to your mirror you may have to account for that from a strictly legal perspective).
Quote: | I appreciate how much bandwidth and disk space #1 will consume. Checking 19,500 hashes is likely to take a loooong time too, no matter how many cores I have. |
The hash calculation is not the problem. But to calculate the hash you have to read each file completely, so for this task you'd likely be IO-bound, not CPU-bound (depending on the storage media of course). Of course once the data is in memory you can feed it to multiple hash functions in parallel, but that's going to require some programming if you don't want to rely on filesystem caching. |
|
Back to top |
|
|
|