Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Has anyone got bayesian filtering work w. SpamAssassin 2.50?
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
jahve
n00b
n00b


Joined: 25 Aug 2002
Posts: 44
Location: Uppsala, Sweden

PostPosted: Wed Feb 26, 2003 9:36 am    Post subject: Has anyone got bayesian filtering work w. SpamAssassin 2.50? Reply with quote

The version of SpamAssassin in the portage tree (2.44) leaks way too much spam to my Inbox.

Here is an sample of the headers from a spam mail that got through SpamAssassin 2.50-filters BUT not Mozilla 1.3b-bayesian filters.
Code:
X-Spam-Status: No, hits=3.2 required=5.0
   tests=HTML_80_90,HTML_IMAGE_ONLY_02,HTML_MESSAGE,REMOVE_PAGE,
         WEIRD_PORT
   version=2.50
X-Spam-Level: ***
X-Spam-Checker-Version: SpamAssassin 2.50 (1.173-2003-02-20-exp)

As seen, SpamAssassin has not tagged this mail (or any other of my mail, spam or non-spam) with "BAYES_XX" as it should.

Not even a very spammy mail is marked with the bayes-rule tag.
Code:
X-Spam-Flag: YES
X-Spam-Status: Yes, hits=13.7 required=5.0
   tests=BANG_EXERCISE,CLICK_BELOW,FWD_MSG,HGH,HTML_70_80,
         HTML_FONT_BIG,HTML_FONT_COLOR_BLUE,HTML_FONT_COLOR_GREEN,
         HTML_FONT_FACE_ODD,HTML_LINK_CLICK_HERE,HTML_MESSAGE,
         INCREASE_SEX,RAZOR2_CHECK,WHILE_YOU_SLEEP
   version=2.50
X-Spam-Level: *************
X-Spam-Checker-Version: SpamAssassin 2.50 1.173-2003-02-20-exp


Spam mails SA lets through Mozilla often catches without significant errors. However I'm not using Mozilla at all times and it would be nice to have this spam-trap effectlively working once and for all.

I have installed SA the dirty way, by simply changing the name of the ebuild and placing it in my local portage tree (it isn't included in the official portage yet).
There is a cronjob that runs sa-learn --spam or --ham once a day in my spam-box and my inbox. This works as it does create a couple files with filenames begining with "bayes" in ~/.spamassassin/.

Where should I go from here?
Back to top
View user's profile Send private message
mglauche
Retired Dev
Retired Dev


Joined: 25 Apr 2002
Posts: 564
Location: Germany

PostPosted: Wed Feb 26, 2003 3:05 pm    Post subject: Reply with quote

i also have sa. 2.50 cvs version installed. it takes quite some time for scanning (up to 1 sec in SA, due to the bayesian filter i think), but i never see any BAYES_* messages in my emails either :(

As i understand SA feeds very high and very low marked messages into the bayesian filter, so i didn't set any cronjob or soever. (the amavis/SA home dir also has the SA db files created, and something got written to it)
Back to top
View user's profile Send private message
jahve
n00b
n00b


Joined: 25 Aug 2002
Posts: 44
Location: Uppsala, Sweden

PostPosted: Wed Feb 26, 2003 4:03 pm    Post subject: Reply with quote

mglauche wrote:
As i understand SA feeds very high and very low marked messages into the bayesian filter, so i didn't set any cronjob or soever. (the amavis/SA home dir also has the SA db files created, and something got written to it)

This might be of some interest, I found it in the sa-learn manual:
Quote:
Autolearning is enabled by default
If you don't have a corpus of mail saved to learn, you can let SpamAssassin automatically learn the mail that you receive. If you are autolearning from scratch, the amount of mail you receive will determine how long until the BAYES_* rules are activated.

It seems like SA waits before invoking bayesian filtering by default. Also, this behaviour is not documented in the Mail::SpamAssassin::Conf-manual, or I haven't found it yet. :) I have already trained SA with good and bad mail so I don't want to wait for it to start using the bayes filters.

Perhaps bayesian filtering can be enabled directly by setting auto_learn = 0. Can someone confirm this?
Back to top
View user's profile Send private message
jahve
n00b
n00b


Joined: 25 Aug 2002
Posts: 44
Location: Uppsala, Sweden

PostPosted: Thu Feb 27, 2003 10:57 am    Post subject: Reply with quote

This might be the solution, I simply don't have enough good mails in the database.

Output from a SA-session in debug-mode:
Code:
debug: debug: Only 32 ham(s) in Bayes DB < 200

Seems like I have to train it with at least 200 good mails. Answering one's own questions is always fun :) (at least you learn a lot of it).

BTW - SA 2.50 is now in the portage tree.
Back to top
View user's profile Send private message
mglauche
Retired Dev
Retired Dev


Joined: 25 Apr 2002
Posts: 564
Location: Germany

PostPosted: Thu Feb 27, 2003 11:28 am    Post subject: Reply with quote

I'm trying SA 2.5 final from portage now :)

btw .. sa works very well with amavisd-new

http://www.ijs.si/software/amavisd/

yeah ! it works now :D :D :D

Feb 27 12:27:20 webserver amavis[15053]: (15053-01) spam_scan: hits=-6.6 tests=BAYES_01
Back to top
View user's profile Send private message
mglauche
Retired Dev
Retired Dev


Joined: 25 Apr 2002
Posts: 564
Location: Germany

PostPosted: Thu Feb 27, 2003 11:32 am    Post subject: Reply with quote

the only gripe i have that it is a bit slow ;)

Code:

Feb 27 12:30:36 webserver amavis[15054]: (15054-02) TIMING [total 1339 ms] - SMTP LHLO: 1 (0%), SMTP pre-MAIL: 0 (0%), SMTP pre-DATA-flush: 2 (0%), SMTP DATA: 39 (3%), body hash: 1 (0%), mime_decode: 10 (1%), get-file-type: 20 (1%), decompose_part: 1 (0%), parts: 0 (0%), AV-scan-1: 4 (0%), AV-scan-2: 95 (7%), SA msg read: 1 (0%), SA parse: 1 (0%), SA check: 1101 (82%), fwd-connect: 5 (0%), fwd-mail-from: 2 (0%), fwd-rcpt-to: 1 (0%), write-header: 3 (0%), fwd-data: 0 (0%), fwd-rundown: 45 (3%), unlink-1-files: 4 (0%), rundown: 0 (0%)


that is on a dual P3-S with 1200 mhz ...[/code]
Back to top
View user's profile Send private message
telex4
l33t
l33t


Joined: 21 Sep 2002
Posts: 704
Location: Reading, UK

PostPosted: Thu Oct 16, 2003 11:35 pm    Post subject: Reply with quote

To revive an old thread...

I have about 1,000 spam messages (about 800 of which spam assassin has caught itself) and around 5,000 ham messages. Yet either spamassassin's baysesian filters are rubbish, or they're not working at all, because I am consistently getting almost exactly the same messages coming through as ham, and a very similar one as spam, when both are spam.

I run sa-learn on the two mboxes and it learns from 0 messages!
Back to top
View user's profile Send private message
jahve
n00b
n00b


Joined: 25 Aug 2002
Posts: 44
Location: Uppsala, Sweden

PostPosted: Sat Oct 18, 2003 6:34 pm    Post subject: Reply with quote

telex4 wrote:
To revive an old thread...

I have about 1,000 spam messages (about 800 of which spam assassin has caught itself) and around 5,000 ham messages. Yet either spamassassin's baysesian filters are rubbish, or they're not working at all, because I am consistently getting almost exactly the same messages coming through as ham, and a very similar one as spam, when both are spam.

I run sa-learn on the two mboxes and it learns from 0 messages!


Try run spamassasin in debug mode and grep on bayes, maybe you'll see what's not working.

You could also try rebuilding the database by deleting the bayes db files in ~/.spamassassin and run sa-learn again.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum