Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Netiquette regarding mirrors and huge downloads
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
GreenNeonWhale
n00b
n00b


Joined: 30 Mar 2016
Posts: 64

PostPosted: Thu Jan 23, 2025 12:10 am    Post subject: Netiquette regarding mirrors and huge downloads Reply with quote

Hi,

I'm a long time Gentoo user, and, a big fan of Gentoo.

I would like to have my own personal, locally stored, copy of:
- Gentoo's entire /distfiles directory -- all the source code.
- A subset of the stage3 files.
- A subset of the install .iso images.
I'm seeking to download all of this data, and, from time to time, update my locally stored copy.

I know that this is HUGE download, and could potentially be a burden on whichever mirror I choose. I'm seeking to avoid over burdening and/or selfishly using mirrors -- in short, I don't want to be a dick.

I have a 1GB fiber connection at my disposal.

My original idea was to find the fastest mirror available to me, and test its max speed to me with a small download. Then, limit my download to a small fraction of that. I found rsync://mirrors.rit.edu/gentoo/ to be the fastest, at around 90MiB/sec. I figured I would limit my download to 2MiB/sec. I started to do that, but then stopped shortly thereafter.

So, would the above be a generally acceptable use of a mirror? If not, would a slower speed be okay?
Should I directly contact the mirror admins and check first?
Or, is a download of this magnitude simply too big to do without, well, being a dick.

I'd appreciate any advice and guidance from the Gentoo community, especially from the folks who maintain our servers.

Thank You!
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1859
Location: Germany

PostPosted: Thu Jan 23, 2025 7:03 am    Post subject: Reply with quote

I'm no sysop for this topic but I would suggest you implement some round robin over multiple mirrors. This way the impact would be smaller over time.
Also some data does not really update that often, so may implement some update check.
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54795
Location: 56N 3W

PostPosted: Thu Jan 23, 2025 1:31 pm    Post subject: Reply with quote

GreenNeonWhale,

I'm tempted to ask why ... but 'because I can' is good enough. :)

Last time I asked, a distfiles mirror was over 250G. That was about 5 years ago. It will be bigger now.
Be aware that the mirrors do not carry fetch restricted packages. You can fetch them but should not host them publicly. They are fetch restricted for a reason.

Raise a bug on infra stating your intentions and ask if its OK.
If you don't get a response, go ahead. It's easier to get forgiveness than to get permission. :)

I have a collection of stuff online olde-distfiles and old Gentoo
Feel free to mirror anything of interest at 2MiB/sec. That server has a 1Gbit/sec network link and traffic is not metered.

Every now and then I update it/add to it and point users here to it for old distfiles.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Wewfus
n00b
n00b


Joined: 23 Mar 2024
Posts: 7

PostPosted: Thu Jan 23, 2025 6:03 pm    Post subject: Reply with quote

Is there any reason why you want to use rsync over git? If it were me I'd take full advantage of the git mirror for Gentoo on github. Since I'm not opposed to using all of Microsoft's bandwidth that I can considering what they do with the code uploaded there. Just something to consider.

Edit: I forgot to mention that most mirrors will throttle you anyway if you attempt to pull too much too fast. Same goes for syncing. I sync'd too much by mistake last week when playing around with the default install path from the handbook. I think it denied me access for a few hours. Don't really know for sure how long the ban was in place because I went to sleep afterwards.

I played with the git sync method and found it to be much faster than rsync. Git has some other features compared to rsync which are nice when it comes to source code. It isn't as good as rsync for binary data though. Perhaps a combination of git for source code+rsync for .isos and other files might be more to your liking.

I've had to push/pull a lot of data to github as part of my job over the years and I don't ever recall it throttling me or not being able to max out our meager cable internet connection. I too am getting fiber in the next few weeks (2Gbps symmetrical connection) so I'm interested in keeping a local mirror as well. My plan was to initially seed it from the github mirror and use the rsync mirrors as a fall back should it ever be down. A lot of the tree doesn't update that often. So once you have it initially seeded with the data from github you'd only have to check the other mirrors once every few days and maybe pull down a handful of updated ebuilds.

The github mirror is here by the way: https://github.com/gentoo/gentoo and for GURU: https://github.com/gentoo/guru
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10723
Location: Somewhere over Atlanta, Georgia

PostPosted: Thu Jan 23, 2025 7:36 pm    Post subject: Reply with quote

OP was asking about distfiles, not repos. And distfiles aren't served with git, for good reason.

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 237

PostPosted: Tue Jan 28, 2025 5:31 am    Post subject: Reply with quote

NeddySeagoon wrote:
Be aware that the mirrors do not carry fetch restricted packages. You can fetch them but should not host them publicly. They are fetch restricted for a reason.

Does this mean that a peer-to-peer solution to the distfiles problem can never exist?

In the NBD thread I was talking about rsync-over-nbd to dload source and hopefully save bandwidth. I see now that can't ever work.

But I was toying around with using single-writer nbd volumes to propagate distfiles. So you'd have /var/cache/distfiles on a writable filesystem on its own block device and where you do you normal distfiles stuff just as before, but now because you can publish that block device over the Internet read-only, you can let others easily snag a tarball from your mirror.

Potentially saving Gentoo bandwidth, and definitely enhancing the security of a Gentoo user's system.
_________________
We are the block device. The kernel is our client.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54795
Location: 56N 3W

PostPosted: Tue Jan 28, 2025 10:08 am    Post subject: Reply with quote

nokilli,

There is the universal set of distfiles. That cannot be legally distributed and as far as I know does not exist in one place.
Then there is the subset that are distributed by Gentoo. Gentoo takes care that they can be distributed.
You are free to distribute these too.

Does that help?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
nokilli
Apprentice
Apprentice


Joined: 25 Feb 2004
Posts: 237

PostPosted: Tue Jan 28, 2025 12:00 pm    Post subject: Reply with quote

NeddySeagoon wrote:
nokilli,

There is the universal set of distfiles. That cannot be legally distributed and as far as I know does not exist in one place.
Then there is the subset that are distributed by Gentoo. Gentoo takes care that they can be distributed.
You are free to distribute these too.

Does that help?


I get it now. You're talking about something like oracle-jdk.
_________________
We are the block device. The kernel is our client.
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54795
Location: 56N 3W

PostPosted: Tue Jan 28, 2025 12:23 pm    Post subject: Reply with quote

nokilli,

Certainly the list produced by
Code:
qgrep RESTRICT | grep mirror
are not on the Gentoo mirrors.
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum