View previous topic :: View next topic |
Author |
Message |
rac Bodhisattva
Joined: 30 May 2002 Posts: 6553 Location: Japanifornia
|
Posted: Wed Sep 15, 2004 10:14 pm Post subject: new search stopwords list |
|
|
We've analyzed the most commonly occurring words on the forums, and made some additions to the stopword list. Attempting to search using any of these words won't return any posts, and if you combine a stopword with other legitimate terms, the stopword just gets ignored.
Here's the current list, including both upstream phpBB's entry and ours:
AFAIK
I
IIRC
Ive
LOL
ROTF
ROTFLMAO
YMMV
a
aber
able
about
above
access
actually
add
after
again
ago
all
almost
along
alot
already
also
always
am
amp
an
and
and
another
answer
any
anybody
anybodys
anyone
anything
anyway
anywhere
are
arent
around
as
ask
askd
at
auch
auf
available
back
bad
be
because
been
before
being
believe
best
better
between
big
bit
both
box
btw
bug
build
but
but
by
can
cannot
cant
card
case
change
che
check
code
come
command
compile
compiled
compiling
computer
con
configuration
correct
could
couldnt
course
create
das
day
days
days
default
den
der
desktop
did
didnt
die
different
do
does
doesnt
doing
done
dont
down
drive
each
edit
either
else
emerged
end
enough
errors
etc
even
ever
every
everybody
everybodys
everyone
everything
exactly
example
failed
far
few
file
files
find
fine
first
fix
fixed
following
for
for
forum
forums
found
from
function
gentoo
get
getting
give
go
going
gone
good
got
gotten
great
guess
had
hard
hardware
has
have
have
havent
having
help
her
here
hers
him
his
home
hope
how
however
hows
href
ich
idea
ideas
if
ill
in
info
ini
install
installation
installed
installing
instead
into
is
isnt
issue
ist
it
its
ive
just
keep
know
large
last
latest
least
less
let
lib
like
liked
line
link
linux
list
little
load
local
log
lol
long
look
looked
looking
looking
looks
lot
machine
made
mal
man
many
may
maybe
me
mean
message
might
mit
mode
more
most
much
must
mustnt
my
name
near
need
net
network
never
new
news
next
nice
nicht
no
non
none
not
nothing
now
of
off
often
old
on
once
one
only
oops
open
option
options
or
org
other
our
ours
out
output
over
own
package
packages
page
part
pas
people
per
play
please
point
possible
post
pretty
probably
problem
problems
program
put
que
question
questioned
questions
quite
quot
quote
rather
read
really
reason
recent
remember
right
run
said
same
saw
say
says
screen
script
see
seem
seems
sees
server
set
setting
settings
setup
she
should
since
sites
small
so
software
solution
some
someone
something
sometime
somewhere
soon
sorry
source
start
started
still
stuff
such
support
sure
take
tell
than
thank
thanks
that
thatd
thats
the
the
their
theirs
them
then
there
theres
these
they
theyd
theyll
theyre
thing
things
think
this
this
those
though
thought
thread
through
thus
time
times
to
too
tried
true
try
trying
two
type
und
under
until
untrue
up
update
upon
use
used
user
users
using
usr
version
very
via
want
was
way
we
well
went
were
werent
what
whats
when
where
which
while
who
whom
whose
why
wide
will
wink
with
where
which
while
who
whom
whose
why
wide
will
wink
with
with
within
without
wont
work
worked
working
works
world
worse
worst
would
wrong
wrote
www
yes
yet
you
you
youd
youll
your
youre
yours _________________ For every higher wall, there is a taller ladder |
|
Back to top |
|
|
klieber Bodhisattva
Joined: 17 Apr 2002 Posts: 3657 Location: San Francisco, CA
|
Posted: Thu Sep 16, 2004 12:11 am Post subject: |
|
|
To follow up on rac's post, the reason we did this was to reduce the size of our search database in mysql. It was overwhelming the database server and causing the slowdowns that people have been experiencing recently.
--kurt _________________ The problem with political jokes is that they get elected |
|
Back to top |
|
|
kamagurka Veteran
Joined: 25 Jan 2004 Posts: 1026 Location: /germany/munich
|
Posted: Mon Sep 20, 2004 6:05 pm Post subject: |
|
|
would it be possible to have the search throw an informative error when searching for stopwords instead of just saying "no posts found"? _________________ If you loved me, you'd all kill yourselves today.
--Spider Jerusalem, the Word |
|
Back to top |
|
|
rac Bodhisattva
Joined: 30 May 2002 Posts: 6553 Location: Japanifornia
|
Posted: Mon Sep 20, 2004 6:26 pm Post subject: |
|
|
It might be possible to change the message to something like "none of your search terms were usable" in the case where you enter only stopwords. Telling you that some of your terms were used, but not others, would be considerably harder. _________________ For every higher wall, there is a taller ladder |
|
Back to top |
|
|
Gentree Watchman
Joined: 01 Jul 2003 Posts: 5350 Location: France, Old Europe
|
Posted: Sat Oct 23, 2004 7:49 pm Post subject: |
|
|
klieber wrote: | To follow up on rac's post, the reason we did this was to reduce the size of our search database in mysql. It was overwhelming the database server and causing the slowdowns that people have been experiencing recently.
--kurt |
You may like to consider how much the lack of an effective search tool is burdgeoning the database.
People cant find what's there, make a new post and there's a new thread of 10 or 20 posts.
Before too long this will become unmanagable and the forum will break.
Without the forum Gentoo would be of limitted use.
I have made concrete suggestions in other posts today.
HTH _________________ Linux, because I'd rather own a free OS than steal one that's not worth paying for.
Gentoo because I'm a masochist
AthlonXP-M on A7N8X. Portage ~x86 |
|
Back to top |
|
|
Deathwing00 Bodhisattva
Joined: 13 Jun 2003 Posts: 4087 Location: Dresden, Germany
|
Posted: Sun Oct 24, 2004 1:27 am Post subject: |
|
|
I made this one sticky... I think it's important to know what words are filtered. |
|
Back to top |
|
|
c45207 n00b
Joined: 08 Mar 2004 Posts: 70
|
Posted: Thu Jan 27, 2005 3:25 am Post subject: |
|
|
Is there any way to override this? For example, today I wanted to find "You have new mail in". However, only mail is a searchable word, so I go lots of useless posts. |
|
Back to top |
|
|
ian! Bodhisattva
Joined: 25 Feb 2003 Posts: 3829 Location: Essen, Germany
|
Posted: Thu Jan 27, 2005 7:06 am Post subject: |
|
|
c45207 wrote: | Is there any way to override this? |
No. _________________ "To have a successful open source project, you need to be at least somewhat successful at getting along with people." -- Daniel Robbins |
|
Back to top |
|
|
Wicked Wesley n00b
Joined: 20 May 2004 Posts: 70 Location: Here
|
Posted: Fri Jan 28, 2005 4:50 pm Post subject: |
|
|
Just to let you know, the word but is in there twice!
Have a nice day! _________________ The Jester!
Linux user 357122! |
|
Back to top |
|
|
knefas l33t
Joined: 21 Dec 2003 Posts: 828
|
Posted: Fri Jan 28, 2005 5:25 pm Post subject: |
|
|
Ohh...also two days, have and this |
|
Back to top |
|
|
masseya Bodhisattva
Joined: 17 Apr 2002 Posts: 2602 Location: Baltimore, MD
|
Posted: Fri Jan 28, 2005 10:49 pm Post subject: |
|
|
Those are particularly insidious words that absolutely have to be stopped so we put the second entry in the stopwords list sort of as a way to add injury to insult for the many weeks of futile searching those words have caused. _________________ if i never try anything, i never learn anything..
if i never take a risk, i stay where i am.. |
|
Back to top |
|
|
Anior Guru
Joined: 17 Apr 2003 Posts: 317 Location: European Union (Stockholm / Sweden)
|
|
Back to top |
|
|
SubAtomic Apprentice
Joined: 20 Dec 2003 Posts: 255 Location: Hobart, TAS, Australia
|
Posted: Thu Feb 10, 2005 3:24 am Post subject: |
|
|
What about RTFM and rtfm, IMHO and imho?
Would a "Suggest words to add to the stopwords list" thread topic (possibly in the Feedback section) be of use? Im thinking of something similar to the report spammers thread. _________________ "The real romance is out ahead and yet to come. The computer revolution hasn't started yet. Don't be misled by the enormous flow of money into bad defacto standards for unsophisticated buyers using poor adaptations of incomplete ideas." -- Alan Kay |
|
Back to top |
|
|
cokey Advocate
Joined: 23 Apr 2004 Posts: 3355
|
Posted: Thu Mar 24, 2005 12:07 pm Post subject: |
|
|
I think "compile" and "error(s)" should be taken out, after all this is gentoo not SuSE _________________ https://otw20.com/ OTW20 The new place for off the wall chat |
|
Back to top |
|
|
masseya Bodhisattva
Joined: 17 Apr 2002 Posts: 2602 Location: Baltimore, MD
|
Posted: Thu Mar 24, 2005 11:02 pm Post subject: |
|
|
cokehabit wrote: | I think "compile" and "error(s)" should be taken out, after all this is gentoo not SuSE | The reason these words are on the list is that they are too commonly appearing to actually be of use in identifying a particular thread. There are so many posts with the words 'compile' or 'error' that it's not a useful descriptor. If I were trying to describe myself to you so you could pick me out of a crowd at an amusement park I would want to avoid a description such as "medium height with blue jeans, sneakers and a tshirt" because it wouldn't really tell you anything that would set me apart from virtually everyone else. This is essentially the kind of description you get when searching for the words 'compile' and 'error'. _________________ if i never try anything, i never learn anything..
if i never take a risk, i stay where i am.. |
|
Back to top |
|
|
kallamej Administrator
Joined: 27 Jun 2003 Posts: 4983 Location: Gothenburg, Sweden
|
Posted: Thu Mar 24, 2005 11:15 pm Post subject: |
|
|
Heh, error is not in the list, actually. _________________ Please read our FAQ Forum, it answers many of your questions.
irc: #gentoo-forums on irc.libera.chat |
|
Back to top |
|
|
cokey Advocate
Joined: 23 Apr 2004 Posts: 3355
|
Posted: Fri Mar 25, 2005 7:52 am Post subject: |
|
|
kallamej wrote: | Heh, error is not in the list, actually. | errors is so i put it in bracket(s) _________________ https://otw20.com/ OTW20 The new place for off the wall chat |
|
Back to top |
|
|
masseya Bodhisattva
Joined: 17 Apr 2002 Posts: 2602 Location: Baltimore, MD
|
Posted: Fri Mar 25, 2005 6:56 pm Post subject: |
|
|
kallamej wrote: | Heh, error is not in the list, actually. | lol.. We should, like, add that and stuff. _________________ if i never try anything, i never learn anything..
if i never take a risk, i stay where i am.. |
|
Back to top |
|
|
cokey Advocate
Joined: 23 Apr 2004 Posts: 3355
|
Posted: Fri Mar 25, 2005 7:05 pm Post subject: |
|
|
is there any way to make the gentoo forums searchable through google like wikipedia is? Perhaps somone could speak to them? That would sort out the seach database while offering google free advertising every time someone searches through gentoo. _________________ https://otw20.com/ OTW20 The new place for off the wall chat |
|
Back to top |
|
|
kallamej Administrator
Joined: 27 Jun 2003 Posts: 4983 Location: Gothenburg, Sweden
|
Posted: Fri Mar 25, 2005 7:54 pm Post subject: |
|
|
Yes, the forums are google searchable, but there are only about 30K pages indexed. It's increasing quite nicely since the urls got html-ised, though. _________________ Please read our FAQ Forum, it answers many of your questions.
irc: #gentoo-forums on irc.libera.chat |
|
Back to top |
|
|
Satori80 Tux's lil' helper
Joined: 24 Feb 2004 Posts: 137
|
Posted: Sat Apr 09, 2005 11:19 am Post subject: |
|
|
Why don't you guys try to get a consensus? I for one would rather have a slow useful search database than a quick irrelevant one. |
|
Back to top |
|
|
curtis119 Bodhisattva
Joined: 10 Mar 2003 Posts: 2160 Location: Toledo, Ohio,USA, North America, Earth, SOL System, Milky Way, The Universe, The Cosmos, and Beyond.
|
Posted: Sat Apr 09, 2005 11:30 am Post subject: |
|
|
Satori80 wrote: | Why don't you guys try to get a consensus? I for one would rather have a slow useful search database than a quick irrelevant one. |
The stop words list is attempting to do both. A quick and relevant search. It's gotten so much better since rac and ian! starting actively doing this. I search constantly and have noticed a significant difference in quality of results. _________________ Gentoo: it's like wiping your ass with silk. |
|
Back to top |
|
|
Satori80 Tux's lil' helper
Joined: 24 Feb 2004 Posts: 137
|
Posted: Sat Apr 09, 2005 11:34 am Post subject: |
|
|
masseya wrote: | cokehabit wrote: | I think "compile" and "error(s)" should be taken out, after all this is gentoo not SuSE | The reason these words are on the list is that they are too commonly appearing to actually be of use in identifying a particular thread. There are so many posts with the words 'compile' or 'error' that it's not a useful descriptor. |
It isn't the words in and of themselves that make them useful or not. It's the use of the words in combination with other specific words. For instance, those generated in an error message. If the search finds all the terms in the error message, you can quickly find the subject of your concern. Without the right words at your disposal, you'll have to fish around through irrelevant topics to try and find what you need to get your system back on its feet. I've found myself in this second situation more often than usual the past few days more than once without resolution to my issue. Now I know why. It isn't because the issue isn't in the forums, it's because it can't be found due to a flaky search. And frankly, I'm pissed about it.
There is a reason error messages are generated in the first place. If you can't make the forums able to find specific input then why bother devoting the resources to keep them online? I always used the forums as a troubleshooting tool in the past. Apparently, I can no longer do that. Too bad for me, huh? |
|
Back to top |
|
|
Satori80 Tux's lil' helper
Joined: 24 Feb 2004 Posts: 137
|
Posted: Sat Apr 09, 2005 11:54 am Post subject: |
|
|
Look, I'm sorry if that last post came off as crass. I wasn't trying to insult anybody, and I didn't mean it as directing my frustration on any one person in particular.
But the sentiment is valid. I mean look at that list. "Man" is in the list? If I have an issue with the "man" program I can't directly look for a resolution to my issue in these forums? Common guys, give us a fighting chance. |
|
Back to top |
|
|
cokey Advocate
Joined: 23 Apr 2004 Posts: 3355
|
Posted: Sat Apr 09, 2005 12:11 pm Post subject: |
|
|
curtis119 wrote: | Satori80 wrote: | Why don't you guys try to get a consensus? I for one would rather have a slow useful search database than a quick irrelevant one. |
The stop words list is attempting to do both. A quick and relevant search. It's gotten so much better since rac and ian! starting actively doing this. I search constantly and have noticed a significant difference in quality of results. |
I've noticed the opposite, i continually miss threads or have no threads come up at all where i would expect at least a few. VERY infuriating if you cannot ONE SINGLE THREAD up. It just makes it look broken. _________________ https://otw20.com/ OTW20 The new place for off the wall chat |
|
Back to top |
|
|
|