View previous topic :: View next topic |
Author |
Message |
tuxmainy n00b
Joined: 15 Nov 2016 Posts: 8
|
Posted: Sun May 14, 2023 3:59 pm Post subject: PFL looking for new owner! |
|
|
Hi,
as the subject says: I, the founder and current owner of PFL (https://portagefilelist.de/), am looking for a new owner. You may have noticed the recent lack of maintenance and support of PFL. The reasons are mainly private and shouldn't be discussed here.
So here I am after ~15 years asking for anyone who want's to take over PFL. This includes the domain as well. The software can be found in github (https://github.com/portagefilelist). The website code is missing because it contains code which is not licensed by me. So I guess the web site has to be rewritten. And to be honest the website really needs a refresh
So from my point of view your job would be:
1. install all the server side stuff on your / a server
2. create a new website for doing queries (remember the API used by e-file)
3. take over the domain
4. have fun with maintaining the data and PFL client side stuff
Anyone interested? Otherwise I have to shutdown PFL. Sorry
regards
Daniel
PS: May an admin pin this post to the top? I think this is an important request. |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Sun May 14, 2023 4:36 pm Post subject: |
|
|
I'm sorry to hear you have to move on from the project (it sounds like maybe you would have preferred to not).
I've made the post an Announcement, so it will remain at the top until there is a hand off or you've had to shut it down.
Could you be more specific about the work involved? How is the list of files generated? What would that look like for a person hosting this, or paying for compute time somewhere?
You mention a new front end would be needed. Is the old front end not transferable to the new owner until they are able to create a new one?
Not knowing what is involved, I'd personally like to see something like that provided by Gentoo. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
tuxmainy n00b
Joined: 15 Nov 2016 Posts: 8
|
Posted: Sun May 14, 2023 7:00 pm Post subject: |
|
|
pjp wrote: | I've made the post an Announcement, so it will remain at the top until there is a hand off or you've had to shut it down. |
Thanks
pjp wrote: | Could you be more specific about the work involved? |
I try to. Please everyone don't hesitate to ask further questions. As I have come up with most of the stuff I don't understand which parts need more explanations.
pjp wrote: | How is the list of files generated? |
The files are collected by all the gentoo users who have PFL emerged. The package adds a python script which is periodically called by cron. This script collects all new packages since last run, packs them into XML and sends them to portagefilelist.de. On the server runs a second script periodically (every hour), unpacks the XML and imports it into a postgres database. Using another database should be fairly easy but please keep in mind that we are talking about lot of data. I have switched between multiple databases (also tried ldap at some point). But anyway, the website simply requests the database.
pjp wrote: | What would that look like for a person hosting this, or paying for compute time somewhere? |
You need to host it by your own. I have rented a dedicated root server from a common provider. So yes, payment for resources would be involved. If someone is more familiar with cloud computing this might be an option. But as I don't have enough knowledge about cloud computing I cannot decide on this. Of course I will provide the current database as a SQL dump.
pjp wrote: | You mention a new front end would be needed. Is the old front end not transferable to the new owner until they are able to create a new one? |
The code of the old frontend is not transferable at all because of license reasons. If you mean "switching over the domain" by "transferable" you have to keep in mind that the domain serves two purposes:
1. collecting data from gentoo users
2. making the data "queryable"
So once you get the domain you should be ready to get the data from point 1. Not providing point 2 will result in some unhappy users but might be ok for one week or so.
Hope this helps.
regards
Daniel |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Sun May 14, 2023 8:45 pm Post subject: |
|
|
Thanks for clarifying. Hopefully that will be useful information for someone. It's a bit beyond my comfort level and budget (0), otherwise I would consider it.
tuxmainy wrote: | The code of the old frontend is not transferable at all because of license reasons. | That is what I was wondering as it seems you were specifically indicating it was not available. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
flexibeast Guru
Joined: 04 Apr 2022 Posts: 474 Location: Naarm/Melbourne, Australia
|
Posted: Mon May 15, 2023 1:10 am Post subject: |
|
|
@tuxmainy,
i just want to say thanks for maintaining PFL, and for so long - i regularly make use of it. Hope someone is able to take it over soon! |
|
Back to top |
|
|
eccerr0r Watchman
Joined: 01 Jul 2004 Posts: 9855 Location: almost Mile High in the USA
|
Posted: Mon May 15, 2023 1:32 am Post subject: |
|
|
Curious of the bandwidth required to maintain the service?
Though I love running services on my computer, I have neither the network bandwidth or the funds to maintain the DNS name after it expires, sorry :( _________________ Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching? |
|
Back to top |
|
|
spica Guru
Joined: 04 Jun 2021 Posts: 331
|
Posted: Mon May 15, 2023 11:48 am Post subject: |
|
|
What are the monthly costs for servers?
How much disk storage needed for database?
If someone can start replicating web ui now, how much time he has? |
|
Back to top |
|
|
NeddySeagoon Administrator
Joined: 05 Jul 2003 Posts: 54673 Location: 56N 3W
|
Posted: Mon May 15, 2023 12:50 pm Post subject: |
|
|
spica,
Server Auction is one place to look for a whole server. I have one of those.
It hosts my Raspberry Pi arm64 binhost among other things.
Maybe you don't need a whole server and a VM or web hosting would work? _________________ Regards,
NeddySeagoon
Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail. |
|
Back to top |
|
|
spica Guru
Joined: 04 Jun 2021 Posts: 331
|
Posted: Mon May 15, 2023 2:37 pm Post subject: |
|
|
NeddySeagoon,
No, no. I want to know the current spendings of the project, how much it costs now. If I know current numbers then I can compare with my infrastructure and then I can understand can I give a shelter or not.
It's a question about requirements for the service
I guess a $5 t3.micro should be enough |
|
Back to top |
|
|
ARomaSH n00b
Joined: 10 May 2005 Posts: 4
|
Posted: Mon May 15, 2023 4:42 pm Post subject: Re: PFL looking for new owner! |
|
|
tuxmainy wrote: |
Anyone interested? Otherwise I have to shutdown PFL. Sorry
|
PM-ed _________________ May the force be with you!!!! |
|
Back to top |
|
|
tuxmainy n00b
Joined: 15 Nov 2016 Posts: 8
|
Posted: Mon May 15, 2023 6:39 pm Post subject: |
|
|
Generally on the hardware: I experienced most trouble with CPU and RAM while inserting new records because of index regeneration and users doing wildcard queries which couldn't be handled be the index (e.g. *foo). The database should have a fair amount of RAM available to handle the indexes. Disk I/O might be a problem but using standard SSDs should be fine.
Using another DBMS which is build for the amount of data might resolve some performance problems (like using NO-SQL?). But my database knowledge is limited to SQL. So...
spica wrote: | What are the monthly costs for servers? |
Hard to say. It's a single dedicated root server which runs multiple services (not just PFL). So I never cared about the PFL specific costs. But to give you an idea:
- Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
- 64GB RAM (~16GB is used by the database)
- 2 4TB SSDs in RAID 1
Server is hosted by a common german hoster (hetzner) and currently costs ~50€/month. But as said there are running multiple services so the resources are not exclusive for running PFL! The domain is a .de domain (no, I also don't understand why I registered a .de domain ). So you should check your local domain dealer for prices.
spica wrote: | How much disk storage needed for database? |
Current size of PFL database on disk is 235GB.
eccerr0r wrote: | Curious of the bandwidth required to maintain the service? |
~3GB of uploads per month. I don't have numbers for the website itself but I guess that's not that much. Said hoster grants unlimited traffic so I never cared about this.
spica wrote: | If someone can start replicating web ui now, how much time he has? |
I'd like to hand over until 1st of July. If there is a plan which takes more time it depends on the plan if I am fine with |
|
Back to top |
|
|
szatox Advocate
Joined: 27 Aug 2013 Posts: 3477
|
Posted: Mon May 15, 2023 7:52 pm Post subject: |
|
|
Quote: | Current size of PFL database on disk is 235GB. | Wow, that's a lot more than I thought it would be.
Is it the actual volume of data, or perhaps there were a lot of updates/deletions over the years and it just requires vacuum full? AFAIR postgres does not delete dead records unless explicitly ordered to. |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1825 Location: Germany
|
|
Back to top |
|
|
tuxmainy n00b
Joined: 15 Nov 2016 Posts: 8
|
Posted: Fri May 19, 2023 7:04 pm Post subject: |
|
|
Banana wrote: | Well I'm from germany, I do have a host and I do know PHP, but the DB size would be a pain.
Are there are historical data in it which could be "archived"? |
Yes, I think so. As said, it's not well maintained currently, archiving included. But I have no clue how much data could be archived. Sorry |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Fri May 19, 2023 8:46 pm Post subject: |
|
|
It may help to post an announcement on the PFL website itself.
Maybe even an announcement / update to the cli tool? I'm not sure how feasible that is. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1825 Location: Germany
|
Posted: Sat May 20, 2023 8:23 am Post subject: |
|
|
I will have a look. Currently I see the website which is able to do a search and display the results, python on the client side which creates a xml and uploads, python on the client which can query, php on the server side to accept the uploaded xml and another php file to import those files.
My idea would be to create a fork and some new server stuff and go from there. I will respond here if my research/experiment is working or not. _________________ Forum Guidelines
PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire |
|
Back to top |
|
|
geki Advocate
Joined: 13 May 2004 Posts: 2387 Location: Germania
|
Posted: Sat May 20, 2023 11:56 am Post subject: |
|
|
pjp wrote: | ..., I'd personally like to see something like that provided by Gentoo. |
tuxmainy wrote: | Using another DBMS which is build for the amount of data might resolve some performance problems (like using NO-SQL?). |
I had a look at how I would implement a portage file list. I had a look at using SQL and found a Closure-Table[0] to manage hierarchy. Seems to work fairly good for a simple approach. But the more I looked around, the more I got to the point that querying file/path metadata is no good fit for a SQL DB. I may be wrong, though. Do you know better?
Instead I looked at what is there already. There is the Portage Database already, full of metadata for each installed package on the user system located at /var/db/pkg. Either parse the data by cli tools and send a subset of its metadata as compressed tarball of updated packages or use python's portage API as e-file is doing it already. On the server manage a modified, stripped Portage Database similar to /var/db/pkg within a tmpfs. Then, I remembered about locate and updatedb. It is been doing file search since decades and came to this thread of stack exchange[1] with a very interesting comment of just using grep with a big text file consisting of all files, which shall be way faster. Once I get my gentoo box running again, I may play around with this approach and see, if I get anything scalable resource- and WebUI performance-wise. I have no idea, but fun to see, where I may end up with.
[0] https://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
[1] https://unix.stackexchange.com/questions/379725/what-kind-of-database-do-updatedb-and-locate-use#comment675254_379729 _________________ hear hear |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Sun May 21, 2023 3:31 pm Post subject: |
|
|
I hadn't considered an implementation, only that with devs building things or through CI, it shouldn't be difficult to add something that tracked which files were installed by a given program. No need for a client tool to submit user data. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
Hu Administrator
Joined: 06 Mar 2007 Posts: 22938
|
Posted: Sun May 21, 2023 3:57 pm Post subject: |
|
|
I believe one of the motivations for user submissions was the prevalence of packages where USE=foo controls whether /usr/bin/foo is built at all. For such packages, any one build may not be representative of all the files the package can generate, so the database needs a sampling of different build configurations. Some packages have many USE flags, so building all possible combinations is infeasible (and not all combinations will actually vary the file list). Arguably, an extension on top of the USE system where the ebuild tracks which flags enable/disable building particular files could help there, but presently Gentoo maintainers are not asked to compile that list in any sort of machine-readable form. |
|
Back to top |
|
|
geki Advocate
Joined: 13 May 2004 Posts: 2387 Location: Germania
|
Posted: Sun May 21, 2023 8:52 pm Post subject: |
|
|
It gets all the more fun when we collect files not only from portage tree packages, but also from user overlay packages. Anyway, it's just something I spend a little of my spare time doing for fun. See, if my variant of NoSQL approach gets somewhere. But I think we better wait for what the other(s) come up with. _________________ hear hear |
|
Back to top |
|
|
pjp Administrator
Joined: 16 Apr 2002 Posts: 20552
|
Posted: Tue May 23, 2023 2:20 am Post subject: |
|
|
Hu wrote: | I believe one of the motivations for user submissions was the prevalence of packages where USE=foo controls whether /usr/bin/foo is built at all. For such packages, any one build may not be representative of all the files the package can generate, so the database needs a sampling of different build configurations. Some packages have many USE flags, so building all possible combinations is infeasible (and not all combinations will actually vary the file list). Arguably, an extension on top of the USE system where the ebuild tracks which flags enable/disable building particular files could help there, but presently Gentoo maintainers are not asked to compile that list in any sort of machine-readable form. | I presumed automation, not manual enumeration. I suppose lacking an upstream manifest, whatever upstream build options could be the expected "creates these files." But I'm not a dev, so it would seem none have thought it worth their time.. _________________ Quis separabit? Quo animo? |
|
Back to top |
|
|
tuxmainy n00b
Joined: 15 Nov 2016 Posts: 8
|
Posted: Thu May 25, 2023 11:10 am Post subject: |
|
|
Hi,
the current dump of the SQL database can be found at https://portagefilelist.de/pfl-files.sql.gz
PFL is used for 3 different questions:
- Which file is own by which package? Sometimes it's not easy to see even if you don't have to enable a special useflag
- Which file is build by which useflag and architecture? Use case obvoius, I hope
- How did the files changed over time? Sometimes you lose a file after an update and it would be nice to see where it is gone.
Point 2 is exactly where you need a huge user base for collecting because covering all useflags and architectrues won't be possible for a singe person, eh?
The current pfl collect script ignores user repositories and I really recommend to not collect user repositories data. This data is not helpfully as it doesn't help someone to know that the file they are looking for is in a package which they cannot emerge. Also this doesn't seem to be a good idea from a data privacy point of view. Imagine you have written a tool called "mirror my favorite p*rnh*b videos" which then gets collected (not the tool itself but the name) and published on a website. Or even more sensitive data like usernames or hashs which are coded into a filenames.
And to be honest: It happened in an early version of PFL (not intended, just never thought about). So I have seen data from user repositoies. I promise to you, this data is mostly garbage at all
Have a look into the current collect script. I think it's easy to understand (at least I hope so )
regards
Daniel |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1825 Location: Germany
|
|
Back to top |
|
|
geki Advocate
Joined: 13 May 2004 Posts: 2387 Location: Germania
|
Posted: Sat May 27, 2023 1:55 pm Post subject: |
|
|
Hi tuxmainy, would you mind to share a tarball with only the db and db tables schemas dump to get a quick look on layout and such. I would not need the content for my tinkering. Then, I would not need to download that much data, a~hem. ... Just to make sure I did not miss something... _________________ hear hear |
|
Back to top |
|
|
Banana Moderator
Joined: 21 May 2004 Posts: 1825 Location: Germany
|
Posted: Sun May 28, 2023 3:43 pm Post subject: |
|
|
I've got a working example (first draft) over here: https://delta-labs.org/portagefilelist/
What it does: you can search for a filename based on my local data collection. Here are some names you can search for: hugo, apache, kernel, mupdf, vim
What it does NOT (yet): show packages, versions. Data endpoint is disabled right now.
I've just implementend the import and frontend. Clientside will be untouched for now, since I do not see any problems with it so far.
@tuxmainy
How do you plan any further steps? Do you want a complete example before switching over?
@geki
If you have also ideas or examples, or whatever, we can join forces. _________________ Forum Guidelines
PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire |
|
Back to top |
|
|
|