Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
PFL looking for new owner!
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Gentoo Chat
View previous topic :: View next topic  
Author Message
tuxmainy
n00b
n00b


Joined: 15 Nov 2016
Posts: 8

PostPosted: Sun May 14, 2023 3:59 pm    Post subject: PFL looking for new owner! Reply with quote

Hi,
as the subject says: I, the founder and current owner of PFL (https://portagefilelist.de/), am looking for a new owner. You may have noticed the recent lack of maintenance and support of PFL. The reasons are mainly private and shouldn't be discussed here.

So here I am after ~15 years asking for anyone who want's to take over PFL. This includes the domain as well. The software can be found in github (https://github.com/portagefilelist). The website code is missing because it contains code which is not licensed by me. So I guess the web site has to be rewritten. And to be honest the website really needs a refresh :D

So from my point of view your job would be:
1. install all the server side stuff on your / a server
2. create a new website for doing queries (remember the API used by e-file)
3. take over the domain
4. have fun with maintaining the data and PFL client side stuff

Anyone interested? Otherwise I have to shutdown PFL. Sorry :(

regards
Daniel

PS: May an admin pin this post to the top? I think this is an important request.
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20524

PostPosted: Sun May 14, 2023 4:36 pm    Post subject: Reply with quote

I'm sorry to hear you have to move on from the project (it sounds like maybe you would have preferred to not).

I've made the post an Announcement, so it will remain at the top until there is a hand off or you've had to shut it down.


Could you be more specific about the work involved? How is the list of files generated? What would that look like for a person hosting this, or paying for compute time somewhere?

You mention a new front end would be needed. Is the old front end not transferable to the new owner until they are able to create a new one?

Not knowing what is involved, I'd personally like to see something like that provided by Gentoo.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
tuxmainy
n00b
n00b


Joined: 15 Nov 2016
Posts: 8

PostPosted: Sun May 14, 2023 7:00 pm    Post subject: Reply with quote

pjp wrote:
I've made the post an Announcement, so it will remain at the top until there is a hand off or you've had to shut it down.

Thanks

pjp wrote:
Could you be more specific about the work involved?

I try to. Please everyone don't hesitate to ask further questions. As I have come up with most of the stuff I don't understand which parts need more explanations.

pjp wrote:
How is the list of files generated?

The files are collected by all the gentoo users who have PFL emerged. The package adds a python script which is periodically called by cron. This script collects all new packages since last run, packs them into XML and sends them to portagefilelist.de. On the server runs a second script periodically (every hour), unpacks the XML and imports it into a postgres database. Using another database should be fairly easy but please keep in mind that we are talking about lot of data. I have switched between multiple databases (also tried ldap at some point). But anyway, the website simply requests the database.

pjp wrote:
What would that look like for a person hosting this, or paying for compute time somewhere?

You need to host it by your own. I have rented a dedicated root server from a common provider. So yes, payment for resources would be involved. If someone is more familiar with cloud computing this might be an option. But as I don't have enough knowledge about cloud computing I cannot decide on this. Of course I will provide the current database as a SQL dump.

pjp wrote:
You mention a new front end would be needed. Is the old front end not transferable to the new owner until they are able to create a new one?

The code of the old frontend is not transferable at all because of license reasons. If you mean "switching over the domain" by "transferable" you have to keep in mind that the domain serves two purposes:
1. collecting data from gentoo users
2. making the data "queryable"
So once you get the domain you should be ready to get the data from point 1. Not providing point 2 will result in some unhappy users but might be ok for one week or so.

Hope this helps.

regards
Daniel
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20524

PostPosted: Sun May 14, 2023 8:45 pm    Post subject: Reply with quote

Thanks for clarifying. Hopefully that will be useful information for someone. It's a bit beyond my comfort level and budget (0), otherwise I would consider it.

tuxmainy wrote:
The code of the old frontend is not transferable at all because of license reasons.
That is what I was wondering as it seems you were specifically indicating it was not available.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
flexibeast
Guru
Guru


Joined: 04 Apr 2022
Posts: 473
Location: Naarm/Melbourne, Australia

PostPosted: Mon May 15, 2023 1:10 am    Post subject: Reply with quote

@tuxmainy,

i just want to say thanks for maintaining PFL, and for so long - i regularly make use of it. Hope someone is able to take it over soon!
Back to top
View user's profile Send private message
eccerr0r
Watchman
Watchman


Joined: 01 Jul 2004
Posts: 9847
Location: almost Mile High in the USA

PostPosted: Mon May 15, 2023 1:32 am    Post subject: Reply with quote

Curious of the bandwidth required to maintain the service?

Though I love running services on my computer, I have neither the network bandwidth or the funds to maintain the DNS name after it expires, sorry :(
_________________
Intel Core i7 2700K/Radeon R7 250/24GB DDR3/256GB SSD
What am I supposed watching?
Back to top
View user's profile Send private message
spica
Guru
Guru


Joined: 04 Jun 2021
Posts: 331

PostPosted: Mon May 15, 2023 11:48 am    Post subject: Reply with quote

What are the monthly costs for servers?
How much disk storage needed for database?
If someone can start replicating web ui now, how much time he has?
Back to top
View user's profile Send private message
NeddySeagoon
Administrator
Administrator


Joined: 05 Jul 2003
Posts: 54644
Location: 56N 3W

PostPosted: Mon May 15, 2023 12:50 pm    Post subject: Reply with quote

spica,

Server Auction is one place to look for a whole server. I have one of those.
It hosts my Raspberry Pi arm64 binhost among other things.

Maybe you don't need a whole server and a VM or web hosting would work?
_________________
Regards,

NeddySeagoon

Computer users fall into two groups:-
those that do backups
those that have never had a hard drive fail.
Back to top
View user's profile Send private message
spica
Guru
Guru


Joined: 04 Jun 2021
Posts: 331

PostPosted: Mon May 15, 2023 2:37 pm    Post subject: Reply with quote

NeddySeagoon,
No, no. I want to know the current spendings of the project, how much it costs now. If I know current numbers then I can compare with my infrastructure and then I can understand can I give a shelter or not.
It's a question about requirements for the service
I guess a $5 t3.micro should be enough
Back to top
View user's profile Send private message
ARomaSH
n00b
n00b


Joined: 10 May 2005
Posts: 4

PostPosted: Mon May 15, 2023 4:42 pm    Post subject: Re: PFL looking for new owner! Reply with quote

tuxmainy wrote:


Anyone interested? Otherwise I have to shutdown PFL. Sorry :(



PM-ed
_________________
May the force be with you!!!!
Back to top
View user's profile Send private message
tuxmainy
n00b
n00b


Joined: 15 Nov 2016
Posts: 8

PostPosted: Mon May 15, 2023 6:39 pm    Post subject: Reply with quote

Generally on the hardware: I experienced most trouble with CPU and RAM while inserting new records because of index regeneration and users doing wildcard queries which couldn't be handled be the index (e.g. *foo). The database should have a fair amount of RAM available to handle the indexes. Disk I/O might be a problem but using standard SSDs should be fine.

Using another DBMS which is build for the amount of data might resolve some performance problems (like using NO-SQL?). But my database knowledge is limited to SQL. So...

spica wrote:
What are the monthly costs for servers?

Hard to say. It's a single dedicated root server which runs multiple services (not just PFL). So I never cared about the PFL specific costs. But to give you an idea:

  • Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
  • 64GB RAM (~16GB is used by the database)
  • 2 4TB SSDs in RAID 1


Server is hosted by a common german hoster (hetzner) and currently costs ~50€/month. But as said there are running multiple services so the resources are not exclusive for running PFL! The domain is a .de domain (no, I also don't understand why I registered a .de domain :roll: ). So you should check your local domain dealer for prices.

spica wrote:
How much disk storage needed for database?

Current size of PFL database on disk is 235GB.

eccerr0r wrote:
Curious of the bandwidth required to maintain the service?

~3GB of uploads per month. I don't have numbers for the website itself but I guess that's not that much. Said hoster grants unlimited traffic so I never cared about this.

spica wrote:
If someone can start replicating web ui now, how much time he has?

I'd like to hand over until 1st of July. If there is a plan which takes more time it depends on the plan if I am fine with ;)
Back to top
View user's profile Send private message
szatox
Advocate
Advocate


Joined: 27 Aug 2013
Posts: 3477

PostPosted: Mon May 15, 2023 7:52 pm    Post subject: Reply with quote

Quote:
Current size of PFL database on disk is 235GB.
Wow, that's a lot more than I thought it would be.
Is it the actual volume of data, or perhaps there were a lot of updates/deletions over the years and it just requires vacuum full? AFAIR postgres does not delete dead records unless explicitly ordered to.
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1803
Location: Germany

PostPosted: Tue May 16, 2023 6:14 am    Post subject: Reply with quote

Well I'm from germany, I do have a host and I do know PHP, but the DB size would be a pain.

Are there are historical data in it which could be "archived"?
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
tuxmainy
n00b
n00b


Joined: 15 Nov 2016
Posts: 8

PostPosted: Fri May 19, 2023 7:04 pm    Post subject: Reply with quote

Banana wrote:
Well I'm from germany, I do have a host and I do know PHP, but the DB size would be a pain.

Are there are historical data in it which could be "archived"?


Yes, I think so. As said, it's not well maintained currently, archiving included. But I have no clue how much data could be archived. Sorry
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20524

PostPosted: Fri May 19, 2023 8:46 pm    Post subject: Reply with quote

It may help to post an announcement on the PFL website itself.

Maybe even an announcement / update to the cli tool? I'm not sure how feasible that is.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1803
Location: Germany

PostPosted: Sat May 20, 2023 8:23 am    Post subject: Reply with quote

I will have a look. Currently I see the website which is able to do a search and display the results, python on the client side which creates a xml and uploads, python on the client which can query, php on the server side to accept the uploaded xml and another php file to import those files.

My idea would be to create a fork and some new server stuff and go from there. I will respond here if my research/experiment is working or not.
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
geki
Advocate
Advocate


Joined: 13 May 2004
Posts: 2387
Location: Germania

PostPosted: Sat May 20, 2023 11:56 am    Post subject: Reply with quote

pjp wrote:
..., I'd personally like to see something like that provided by Gentoo.

tuxmainy wrote:
Using another DBMS which is build for the amount of data might resolve some performance problems (like using NO-SQL?).

I had a look at how I would implement a portage file list. I had a look at using SQL and found a Closure-Table[0] to manage hierarchy. Seems to work fairly good for a simple approach. But the more I looked around, the more I got to the point that querying file/path metadata is no good fit for a SQL DB. I may be wrong, though. Do you know better?

Instead I looked at what is there already. There is the Portage Database already, full of metadata for each installed package on the user system located at /var/db/pkg. Either parse the data by cli tools and send a subset of its metadata as compressed tarball of updated packages or use python's portage API as e-file is doing it already. On the server manage a modified, stripped Portage Database similar to /var/db/pkg within a tmpfs. Then, I remembered about locate and updatedb. It is been doing file search since decades and came to this thread of stack exchange[1] with a very interesting comment of just using grep with a big text file consisting of all files, which shall be way faster. Once I get my gentoo box running again, I may play around with this approach and see, if I get anything scalable resource- and WebUI performance-wise. I have no idea, but fun to see, where I may end up with. :D

[0] https://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
[1] https://unix.stackexchange.com/questions/379725/what-kind-of-database-do-updatedb-and-locate-use#comment675254_379729
_________________
hear hear
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20524

PostPosted: Sun May 21, 2023 3:31 pm    Post subject: Reply with quote

geki wrote:
...
I hadn't considered an implementation, only that with devs building things or through CI, it shouldn't be difficult to add something that tracked which files were installed by a given program. No need for a client tool to submit user data.
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
Hu
Administrator
Administrator


Joined: 06 Mar 2007
Posts: 22877

PostPosted: Sun May 21, 2023 3:57 pm    Post subject: Reply with quote

I believe one of the motivations for user submissions was the prevalence of packages where USE=foo controls whether /usr/bin/foo is built at all. For such packages, any one build may not be representative of all the files the package can generate, so the database needs a sampling of different build configurations. Some packages have many USE flags, so building all possible combinations is infeasible (and not all combinations will actually vary the file list). Arguably, an extension on top of the USE system where the ebuild tracks which flags enable/disable building particular files could help there, but presently Gentoo maintainers are not asked to compile that list in any sort of machine-readable form.
Back to top
View user's profile Send private message
geki
Advocate
Advocate


Joined: 13 May 2004
Posts: 2387
Location: Germania

PostPosted: Sun May 21, 2023 8:52 pm    Post subject: Reply with quote

It gets all the more fun when we collect files not only from portage tree packages, but also from user overlay packages. Anyway, it's just something I spend a little of my spare time doing for fun. See, if my variant of NoSQL approach gets somewhere. But I think we better wait for what the other(s) come up with. :)
_________________
hear hear
Back to top
View user's profile Send private message
pjp
Administrator
Administrator


Joined: 16 Apr 2002
Posts: 20524

PostPosted: Tue May 23, 2023 2:20 am    Post subject: Reply with quote

Hu wrote:
I believe one of the motivations for user submissions was the prevalence of packages where USE=foo controls whether /usr/bin/foo is built at all. For such packages, any one build may not be representative of all the files the package can generate, so the database needs a sampling of different build configurations. Some packages have many USE flags, so building all possible combinations is infeasible (and not all combinations will actually vary the file list). Arguably, an extension on top of the USE system where the ebuild tracks which flags enable/disable building particular files could help there, but presently Gentoo maintainers are not asked to compile that list in any sort of machine-readable form.
I presumed automation, not manual enumeration. I suppose lacking an upstream manifest, whatever upstream build options could be the expected "creates these files." But I'm not a dev, so it would seem none have thought it worth their time..
_________________
Quis separabit? Quo animo?
Back to top
View user's profile Send private message
tuxmainy
n00b
n00b


Joined: 15 Nov 2016
Posts: 8

PostPosted: Thu May 25, 2023 11:10 am    Post subject: Reply with quote

Hi,
the current dump of the SQL database can be found at https://portagefilelist.de/pfl-files.sql.gz

PFL is used for 3 different questions:

  • Which file is own by which package? Sometimes it's not easy to see even if you don't have to enable a special useflag
  • Which file is build by which useflag and architecture? Use case obvoius, I hope :)
  • How did the files changed over time? Sometimes you lose a file after an update and it would be nice to see where it is gone.


Point 2 is exactly where you need a huge user base for collecting because covering all useflags and architectrues won't be possible for a singe person, eh?

The current pfl collect script ignores user repositories and I really recommend to not collect user repositories data. This data is not helpfully as it doesn't help someone to know that the file they are looking for is in a package which they cannot emerge. Also this doesn't seem to be a good idea from a data privacy point of view. Imagine you have written a tool called "mirror my favorite p*rnh*b videos" which then gets collected (not the tool itself but the name) and published on a website. Or even more sensitive data like usernames or hashs which are coded into a filenames.
And to be honest: It happened in an early version of PFL (not intended, just never thought about). So I have seen data from user repositoies. I promise to you, this data is mostly garbage at all ;)

Have a look into the current collect script. I think it's easy to understand (at least I hope so :D )


regards
Daniel
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1803
Location: Germany

PostPosted: Sat May 27, 2023 7:27 am    Post subject: Reply with quote

I've currently "rewrote" some of the stuff to get the idea how each part works. I'll have a working example in the next couple of days.
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
geki
Advocate
Advocate


Joined: 13 May 2004
Posts: 2387
Location: Germania

PostPosted: Sat May 27, 2023 1:55 pm    Post subject: Reply with quote

tuxmainy wrote:
Hi,
the current dump of the SQL database can be found at https://portagefilelist.de/pfl-files.sql.gz

Hi tuxmainy, would you mind to share a tarball with only the db and db tables schemas dump to get a quick look on layout and such. I would not need the content for my tinkering. Then, I would not need to download that much data, a~hem. :D ... Just to make sure I did not miss something...
_________________
hear hear
Back to top
View user's profile Send private message
Banana
Moderator
Moderator


Joined: 21 May 2004
Posts: 1803
Location: Germany

PostPosted: Sun May 28, 2023 3:43 pm    Post subject: Reply with quote

I've got a working example (first draft) over here: https://delta-labs.org/portagefilelist/

What it does: you can search for a filename based on my local data collection. Here are some names you can search for: hugo, apache, kernel, mupdf, vim

What it does NOT (yet): show packages, versions. Data endpoint is disabled right now.
I've just implementend the import and frontend. Clientside will be untouched for now, since I do not see any problems with it so far.

@tuxmainy
How do you plan any further steps? Do you want a complete example before switching over?

@geki
If you have also ideas or examples, or whatever, we can join forces.
_________________
Forum Guidelines

PFL - Portage file list - find which package a file or command belongs to.
My delta-labs.org snippets do expire
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Gentoo Chat All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum