View previous topic :: View next topic |
Author |
Message |
juantxorena Apprentice
Joined: 19 Mar 2006 Posts: 201 Location: The Shire
|
Posted: Sat May 16, 2009 10:01 am Post subject: |
|
|
Just a remainder to everybody, the forum is still useless.
Also, I have made a surprising discovery: in the google search ban thread, which is now locked (so I can't say this there), somebody said that the reason of blocking google bots were that they stressed the bugzilla database, or something like that, so the forums banned them. Now I have found that while the google search is still blocked for the forum, the bugzilla database is searchable with google (search anything in google using "site:bugs.gentoo.org", and compare the results to the ones of searching "site:forums.gentoo.org"). This is even more stupid than the whole forums suckiness stuff.
Who is the responsible of this nonsense? _________________ I cannot write English very well. Please, correct any mistake so that I can improve. |
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Sun May 17, 2009 6:35 am Post subject: |
|
|
Split from "new search stopwords list".
juantxorena wrote: | Also, I have made a surprising discovery: in the google search ban thread, which is now locked (so I can't say this there), somebody said that the reason of blocking google bots were that they stressed the bugzilla database, or something like that, so the forums banned them. | Two things need to be clarified in that summary. Something using that user agent identity was part of an incident with bugs.gentoo.org, and possibly other Gentoo sites, whether it was actually Google or not has never been made clear to me. Blocking GoogleBot from the forums was neither done via the forums administration interface nor by any member of the forums team it is not a ban it is blocking in a manner which the forums lack provision to remove.
juantxorena wrote: | Now I have found that while the google search is still blocked for the forum, the bugzilla database is searchable with google (search anything in google using "site:bugs.gentoo.org", and compare the results to the ones of searching "site:forums.gentoo.org"). This is even more stupid than the whole forums suckiness stuff. | Set a browser to claim to be GoogleBot, then making sure that the address you are browsing from does not map to a legitimate address for a real GoogleBot try browsing bugs.gentoo.org and forums.gentoo.org, avoid posting until the laughter and swearing have subsided.
juantxorena wrote: | Who is the responsible of this nonsense? | That was addressed in the previous topic.
In short, this is a known problem and it will be addressed properly as soon as it can be, no sooner. |
|
Back to top |
|
|
muhsinzubeir l33t
Joined: 29 Sep 2007 Posts: 948 Location: /home/muhsin
|
Posted: Mon May 18, 2009 10:30 am Post subject: |
|
|
i thought may be this forum search should be replaced with google custom search....anyone think it might be a good idea? _________________ ~x86
p5k-se
Intel Core 2 Duo
Nvidia GT200
http://www.zanbytes.com |
|
Back to top |
|
|
gentoo-dev Apprentice
Joined: 24 Jan 2006 Posts: 172
|
Posted: Mon May 18, 2009 10:53 am Post subject: |
|
|
muhsinzubeir wrote: | i thought may be this forum search should be replaced with google custom search....anyone think it might be a good idea? | That would only work if google was allowed to index the content of forums.g.o in the first place.
Google has been banned for no real reason. Allowing it back is a 5 minutes fix, but no, Gentoo devs would rather lock the thread than actually help. https://forums.gentoo.org/viewtopic-t-711943.html |
|
Back to top |
|
|
lordcris Apprentice
Joined: 09 Jul 2002 Posts: 248
|
Posted: Fri May 22, 2009 11:51 am Post subject: |
|
|
c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ... |
|
Back to top |
|
|
pilla Bodhisattva
Joined: 07 Aug 2002 Posts: 7731 Location: Underworld
|
Posted: Fri May 22, 2009 12:24 pm Post subject: |
|
|
lordcris wrote: | c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ... |
Forums are dying? Do you have numbers to back up your "HUGE decline"? Cut the FUD. _________________ "I'm just very selective about the reality I choose to accept." -- Calvin |
|
Back to top |
|
|
M Guru
Joined: 12 Dec 2006 Posts: 432
|
Posted: Fri May 22, 2009 1:06 pm Post subject: |
|
|
Maybe they are not dying but they will die if this continues, how can someone find that gentoo even has forums (or had) ? From gentoo home page maybe, if home page doesn't scare them away. Numbers? FUD? People here expect better answer, only numbers we need is estimate number of days (maybe minutes or sec.) needed to solve this. I don't get it, people everywhere are doing best they can so google can index better their site, but no, gentoo people decided they don't need google. What happended to that squid proxy, can someone just edit that robots.txt file please. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
Posted: Fri May 22, 2009 3:12 pm Post subject: |
|
|
lordcris wrote: | c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ... |
++
M wrote: | Maybe they are not dying but they will die if this continues, how can someone find that gentoo even has forums (or had) ? From gentoo home page maybe, if home page doesn't scare them away. Numbers? FUD? People here expect better answer, only numbers we need is estimate number of days (maybe minutes or sec.) needed to solve this. I don't get it, people everywhere are doing best they can so google can index better their site, but no, gentoo people decided they don't need google. What happended to that squid proxy, can someone just edit that robots.txt file please. |
++
the problem is that gentoo-related problems are not appearing that often anymore in google and this to a big extent is caused by fgo not showing up ...
also what is the purpose in significantly complicating to search for or find solution to ones problems ?
letting a search-engine do full-text indexing is the way to go or at least improve the search function so that it doesn't cut 95% of all search words
thanks _________________ https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa
Hardcore Gentoo Linux user since 2004 |
|
Back to top |
|
|
bunder Bodhisattva
Joined: 10 Apr 2004 Posts: 5939
|
Posted: Fri May 22, 2009 11:52 pm Post subject: |
|
|
pilla wrote: | lordcris wrote: | c'mon man!!!!!
enable googlebot indexing.
forums are dying.
i've noticed a HUGE decline in users over last monts.
why are you doing this to my favorite distribution?
you should be all ashamed of your-selfs.
it is been more than a year that this shit is going on ... |
Forums are dying? Do you have numbers to back up your "HUGE decline"? Cut the FUD. |
while i don't have any concrete proof, i would certainly bet that fgo usage has gone down since we disappeared from search engines. if you want to call that fud, so be it. _________________
Neddyseagoon wrote: | The problem with leaving is that you can only do it once and it reduces your influence. |
banned from #gentoo since sept 2017 |
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
|
Back to top |
|
|
d2_racing Bodhisattva
Joined: 25 Apr 2005 Posts: 13047 Location: Ste-Foy,Canada
|
Posted: Sat May 23, 2009 1:08 am Post subject: |
|
|
If they solve the problem, it will be good for everybody |
|
Back to top |
|
|
hitachi Guru
Joined: 20 Feb 2006 Posts: 478 Location: Freiburg / Deutschland
|
Posted: Thu Jun 04, 2009 8:55 am Post subject: |
|
|
It looks like whatever site:forums.gento.org is working again. Tested it with google and bing. Can anyone confirme that? |
|
Back to top |
|
|
think4urs11 Bodhisattva
Joined: 25 Jun 2003 Posts: 6659 Location: above the cloud
|
Posted: Thu Jun 04, 2009 10:24 am Post subject: |
|
|
hitachi wrote: | Can anyone confirme that? |
(non-official answer) seems as if since yesterday we're again indexed by Google. _________________ Nothing is secure / Security is always a trade-off with usability / Do not assume anything / Trust no-one, nothing / Paranoia is your friend / Think for yourself |
|
Back to top |
|
|
M Guru
Joined: 12 Dec 2006 Posts: 432
|
Posted: Thu Jun 04, 2009 10:37 am Post subject: |
|
|
Whouu, nice, we are again first for term "forums", that googlebot is really fast. |
|
Back to top |
|
|
Akkara Bodhisattva
Joined: 28 Mar 2006 Posts: 6702 Location: &akkara
|
Posted: Thu Jun 04, 2009 11:24 am Post subject: |
|
|
A note of thanks to everyone who had their hand in re-enabling googlebot: thanks!
M wrote: | Whouu, nice, we are again first for term "forums", that googlebot is really fast. |
Let's hope it is not *too* fast, lest it overloads the forums system and gets blocked again.
Hmm... I wonder if there's a way to rate-limit to specific destinations. |
|
Back to top |
|
|
bunder Bodhisattva
Joined: 10 Apr 2004 Posts: 5939
|
Posted: Thu Jun 04, 2009 12:01 pm Post subject: |
|
|
actually, for the record... KoT noticed it the day we turned it back on (which was actually the May 29th)... i was hoping for an official comment from the remaining staff, so i asked him to redact his kudos. long story short, we thought we might have to turn it off again... and for all i know, we still might. *shrugs* _________________
Neddyseagoon wrote: | The problem with leaving is that you can only do it once and it reduces your influence. |
banned from #gentoo since sept 2017 |
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Fri Jun 05, 2009 4:25 am Post subject: |
|
|
hitachi wrote: | Can anyone confirme that? | I can, officially if needs be. Along with the other common search engine spiders Googlebot has been allowed back in for very slightly over a week at this point.
This was deliberately not announced publicly as the index of the forums that Google is using includes only a small fraction of the actual content present in the forums. The intention was to make such an announcement once some additional measures were in place to attempt to provide a more comprehensive view of the contents of the forums to Google and other search engines so that they could provide more meaningful search results.
As it turns out there was an actual problem with respect to Google and other spiders indexing the forum; having been allowed to run searches they did so to the point of consuming all available memory on the front end. Such behavior is no longer allowed and any bots doing so with sufficient frequency to potentially cause problems will be blocked, either from searching or outright. So far this seems to be working well enough in terms of resource usage, though if necessary further restrictions will be put in place or current restrictions may simply be more strongly enforced.
Naturally, this is all subject to resource availability and as such subject to change. |
|
Back to top |
|
|
neysx Retired Dev
Joined: 27 Jan 2003 Posts: 795
|
Posted: Fri Jun 05, 2009 5:55 am Post subject: |
|
|
desultory wrote: | hitachi wrote: | Can anyone confirme that? | As it turns out there was an actual problem with respect to Google and other spiders indexing the forum; having been allowed to run searches they did so to the point of consuming all available memory on the front end. Such behavior is no longer allowed and any bots doing so with sufficient frequency to potentially cause problems will be blocked, either from searching or outright. So far this seems to be working well enough in terms of resource usage, though if necessary further restrictions will be put in place or current restrictions may simply be more strongly enforced.
Naturally, this is all subject to resource availability and as such subject to change. | Tweak robots.txt. It's trivial to do and has been explained ad nauseum in threads about this issue, but looking at https://forums.gentoo.org/robots.txt Code: | User-agent: *
Disallow: /cgi-bin/
Disallow: /search.php
Disallow: /admin/
Disallow: /memberlist.php
Disallow: /groupcp.php
Disallow: /statistics.php
Disallow: /profile.php
Disallow: /privmsg.php
Disallow: /login.php | nothing's been done so far to limit bots (all are allowed) or limit the number of hits they are allowed to make. Before you switch it off again, I suggest editing robots.txt first... |
|
Back to top |
|
|
Akkara Bodhisattva
Joined: 28 Mar 2006 Posts: 6702 Location: &akkara
|
Posted: Fri Jun 05, 2009 6:13 am Post subject: |
|
|
[Context: I don't know anything about web serving, hosting, etc.]
desultory wrote: | [...] having been allowed to run searches they did so to the point of consuming all available memory on the front end. [...] |
Do you mean to say, that besides following links and indexing the static pages, they were trying various search terms in the "search" box? And if so, any idea why? If they spidered all the static pages, what more would there be that 'search' would return, that would warrent them to try it? (And with what keywords even, how does a bot pick and choose?) |
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Fri Jun 05, 2009 7:02 am Post subject: |
|
|
neysx wrote: | Tweak robots.txt. | What do you think the first thing done to limit well behaved spiders was?
Somehow, I noticed.
neysx wrote: | nothing's been done so far to limit bots (all are allowed) or limit the number of hits they are allowed to make. | That being rather the point of the whole exercise, to be as permissive as resources allow.
neysx wrote: | Before you switch it off again, I suggest editing robots.txt first... | Neither was I personally nor was any other member of the forum staff involved in blocking Google in the first place. Given that it took approximately a year to get this far, what leads you to infer that it would be blocked again lightly?
Akkara wrote: | Do you mean to say, that besides following links and indexing the static pages, they were trying various search terms in the "search" box? | Not exactly, they would just follow links to search results. It is not a matter of deliberately searching for things, they were simply following links that they had encountered elsewhere. That it triggered a search on this site was entirely inconsequential so far as the spiders were concerned.
Akkara wrote: | If they spidered all the static pages, what more would there be that 'search' would return, that would warrent them to try it? | In practice nothing is gained by third party search engines running searches on the forums because they are supposed to index everything the search function would return results for anyway, which is why disallowing searches by spiders is such an acceptable solution. |
|
Back to top |
|
|
kernelOfTruth Watchman
Joined: 20 Dec 2005 Posts: 6111 Location: Vienna, Austria; Germany; hello world :)
|
Posted: Sat Jun 06, 2009 12:45 pm Post subject: |
|
|
desultory wrote: |
Akkara wrote: | Do you mean to say, that besides following links and indexing the static pages, they were trying various search terms in the "search" box? | Not exactly, they would just follow links to search results. It is not a matter of deliberately searching for things, they were simply following links that they had encountered elsewhere. That it triggered a search on this site was entirely inconsequential so far as the spiders were concerned.
|
would it be much of a deal then to disable the forum's own search functionality (-> load) and leave everything to the search engines ?
or even go that far to use google's search engine exclusively and locking the others out
that way the load would be rather acceptable, I suppose
thanks, btw, VERY VERY MUCH for enabling search indexing by search engines again _________________ https://github.com/kernelOfTruth/ZFS-for-SystemRescueCD/tree/ZFS-for-SysRescCD-4.9.0
https://github.com/kernelOfTruth/pulseaudio-equalizer-ladspa
Hardcore Gentoo Linux user since 2004 |
|
Back to top |
|
|
muhsinzubeir l33t
Joined: 29 Sep 2007 Posts: 948 Location: /home/muhsin
|
Posted: Sat Jun 06, 2009 1:16 pm Post subject: |
|
|
sweet... _________________ ~x86
p5k-se
Intel Core 2 Duo
Nvidia GT200
http://www.zanbytes.com |
|
Back to top |
|
|
desultory Bodhisattva
Joined: 04 Nov 2005 Posts: 9410
|
Posted: Sun Jun 07, 2009 8:02 am Post subject: |
|
|
kernelOfTruth wrote: | would it be much of a deal then to disable the forum's own search functionality (-> load) and leave everything to the search engines ? | While it would be possible to do, it is highly unlikely that the forums would be allowed to go without an integrated search engine.
Reasons include, but are not limited to:
- The demonstrated ease with which external search engines can be blocked and the difficulty and most especially the delay involved in restoring their access.
- The current poor coverage of known external indexes of the forums.
- Embedding advertising in core site functions is almost certainly a nonstarter, equally so getting funding to pay thousands of dollars per year to avoid them. Not that I have contacted Google regarding embedding their search engine in the forums, nor do I intend to.
- Closely integrated or fully internal search engines are not subject to the intrinsic lag involved with using an external spider fed search engine.
- Other measures to improve site search are being explored.
Just to reiterate, regular numbers of searches run by users does not seem to be a problem, just repeated heavy usage of the search functions by spiders.
kernelOfTruth wrote: | or even go that far to use google's search engine exclusively and locking the others out | There are no plans to block any well behaved spiders, in part because as it is any well behaved spider should impose little more load than a few regular users could be expected to generate.
kernelOfTruth wrote: | that way the load would be rather acceptable, I suppose | As it is, the load seems to be adequately sustainable, points of concern should be soluble by analysis of the logs. |
|
Back to top |
|
|
Dont Panic Guru
Joined: 20 Jun 2007 Posts: 322 Location: SouthEast U.S.A.
|
|
Back to top |
|
|
pilla Bodhisattva
Joined: 07 Aug 2002 Posts: 7731 Location: Underworld
|
Posted: Wed Jun 10, 2009 10:59 am Post subject: |
|
|
Yes, we know by the number of spammers we have been receiving. _________________ "I'm just very selective about the reality I choose to accept." -- Calvin |
|
Back to top |
|
|
|