Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
bogofilter+razor filters for kmail spam filtering
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks
View previous topic :: View next topic  
Author Message
alkan
Guru
Guru


Joined: 06 Aug 2004
Posts: 385
Location: kasimlar yaylasi

PostPosted: Thu Dec 02, 2004 5:43 pm    Post subject: bogofilter+razor filters for kmail spam filtering Reply with quote

I search the forum and internet how to integrate razor into KMail. It looks like the only way is to use a MTA with razor to deliver the mail to a local box and then read the mail from kmail. Since I don't want to run a mail server, I found a workaround so that kmail directly uses razor.

Since razor checks the central database over the internet, hence it is slow, I added bogofilter. KMail filters first check the bogofilter, if the result is negative, only then it tries razor.

Other feature is that razor is used to train the bogofilter, this way you become less and less depended on razor over the time. Statring from an untrained bogofilter database, bogofilter started detecting spam within a few hours.

First we need to install bogofilter, razor, maildrop
Code:

emerge bogofilter razor maildrop


configure razor
Code:

razor-admin -create
razor-admin -register



The small script is used to pipe the mail through razor-check is written in a hurry, So I welcome any improvements on that.
Place this script into /usr/local/bin/razor_check.
It checks the razor, and adds X-Razor header to the email.
Code:

#! /bin/bash
#/usr/local/bin/razor_check

cat > /tmp/razor-check
if  cat /tmp/razor-check | razor-check;  then
        cat /tmp/razor-check | reformail -a'X-Razor: yes';
else
        cat /tmp/razor-check | reformail -a'X-Razor: no';
fi


make it executable
Code:

chmod +x /usr/local/bin/razor_check


now, KMail filters in the exact following order:
-----------------------
Name:Bogofilter Check
Filtering Criteria: size <= 256000 bytes
Filter Actions: Pipe Throught bogofilter -p -e -u
Uncheck "stop processing here"
----------------------------
Name:Bogofilter Spam handling
Filtering Criteria: X-Bogosity contains yes
Filter Actions: File into Folder spam, Mark as Spam, Pipe Throught razor-report
----------------------------
Name:Razor Check
Filtering Criteria: size <= 256000 bytes
Filter Actions: Pipe Throught /usr/local/bin/razor_check
Uncheck "stop processing here"
----------------------------
Name:Bogofilter Razor Spam handling
Filtering Criteria: X-Razor contains yes
Filter Actions: File into Folder spam, Execute Commad bogofilter -N -s, Mark as Spam
----------------------------
Name:Classify as Spam
Filter Actions: Mark As Spam, Execute Commad bogofilter -N -s, Pipe Throught razor-report, File into Folder spam
Uncheck "to incoming messages", "on manual filtering"
Check "Add this filter to the Apply Filter Menu"
----------------------------
Name:Classify as Ham
Filter Actions: Mark As Ham, Execute Commad bogofilter -S -n, Pipe Throught razor-revoke
Uncheck "to incoming messages", "on manual filtering"
Check "Add this filter to the Apply Filter Menu"
----------------------------

In short those filters first check the bogofilter database, if positive it is reported to the razor database, otherwise razor data base is checked and the result is used to train bogofilter. Positive results are placed into Spam folder.
Back to top
View user's profile Send private message
rocketchef
n00b
n00b


Joined: 08 Feb 2004
Posts: 24

PostPosted: Sun Dec 05, 2004 4:51 pm    Post subject: Reply with quote

nice howto, but what if bogofilter considers a mail as ham? Shouldn't it deliver it directly instead of checking razor?
My approach makes use of bogofilters ability to classify mail in tri-state ham, unsure, spam. Only if bogofilter is unsure about the status of the mail it lets razor check for known spam. Additionally my solution adds a whitelist feature for even reduced processing time. I did not manage to configure procmail in the way I wanted it so wrote a little script which grabs the mail from fetchmail and delivers it appropriately.

Code:

#!/usr/bin/perl -w
use strict;

my @mail = <>;
my $from;
my $filename = time().'.'.int(rand(1000)).'.msg'; # this way of naming is dangerous


open LOG,">>/home/niels/Mail/mail.log" || die "nope";
   
foreach (@mail)
{
   next unless /^From: (.*)/;
   $from = $1;
   last;
}

if ( $from =~ m/GMX/ ) # get rid of GMX "info" spamming
{
   print LOG "From: $from\n\tGMX Newsletter, delivering to newsletter/new/$filename\n";
   open FILE,">/home/niels/Mail/newsletter/new/$filename" || die "nope";
   print FILE @mail;
   close FILE;
   close LOG;
   exit 0;
}

open WHITE,"</home/niels/Mail/white.lst" || warn "nope";
my @white = <WHITE>;
close WHITE;

foreach (@white)
{
   chomp $_;
   if ( $from =~ m/$_/ )
   {
      open FILE,">/home/niels/Mail/inbox/new/$filename" || die "nope";
      print FILE @mail;
      close FILE;
      
      print LOG "From: $from\n\tUser in Whitelist, delivering to inbox/new/$filename\n";
      
      exit 0;
   }
   else { next }
}
open TMP,">/tmp/$filename"; # I'm not too happy with temporary files, but this is the easiest way
print TMP @mail;
close TMP;

my $bogoresult = system("/usr/bin/bogofilter -I /tmp/$filename");
if ( ( $bogoresult >> 8 ) == 0)
{
   open FILE,">/home/niels/Mail/spam/new/$filename" || die "nope";
   print FILE @mail;
   close FILE;
      
   print LOG "From: $from\n\tBogofilter considers this mail as spam, delivering to spam/new/$filename\n";
   close LOG;
   unlink "/tmp/$filename";
   exit 0;
}
if ( ( $bogoresult >> 8 ) == 1 )
{
   open FILE,">/home/niels/Mail/inbox/new/$filename" || die "nope";
   print FILE @mail;
   close FILE;
      
   print LOG "From: $from\n\tBogofilter considers this mail as ham, delivering to inbox/new/$filename\n";
   close LOG;
   unlink "/tmp/$filename";
   exit 0;
}
if ( ( $bogoresult >> 8 ) == 2 )
{
   if ( ( system("/usr/bin/razor-check /tmp/$filename") >> 8 ) == 0 )
   {
      open FILE,">/home/niels/Mail/spam/new/$filename" || die "nope";
      print FILE @mail;
      close FILE;
      
      print LOG "From: $from\n\tRazor considers this mail as spam, delivering to spam/new/$filename\n";
      close LOG;
      system("/usr/bin/bogofilter -s -I /tmp/$filename");
      unlink "/tmp/$filename";
      exit 0;
   }
   else
   {
      open FILE,">/home/niels/Mail/inbox/new/$filename" || die "nope";
      print FILE @mail;
      close FILE;
      
      unlink "/tmp/$filename";
      print LOG "From: $from\n\tneither bogofilter nor razor are sure wether this mail is spam , delivering to inbox/new/$filename\n";
      close LOG;
      exit 0;
   }
}
if ( ( $bogoresult >> 8 ) > 2 )
{
   print LOG "From: $from\n\tWarning, something really ugly happened, delivering to inbox/new/$filename\n";
   open FILE,">/home/niels/Mail/inbox/new/$filename" || die "nope";
   print FILE @mail;
   close FILE;
   close LOG;
   exit 1;
}


a little ugly, but it's not perl's fault! :)

HTH,
rocketchef
Back to top
View user's profile Send private message
alkan
Guru
Guru


Joined: 06 Aug 2004
Posts: 385
Location: kasimlar yaylasi

PostPosted: Sun Dec 05, 2004 6:15 pm    Post subject: Reply with quote

Thanks for sharing your configiration. It will help further to improve mine.

I will eventually extend it not to double check razor if is marked ham by bogofilter. See, initially, I wasn't sure about the performance of the bogofilter. It's been 3 days since bogofilter training started, currenly bagofilter detects 1/3 of the spams while razor catches remaining 2/3 (eventhought bogofilter says it is a ham for some). But the rate is improving fast for the bogofilter. Until I am satisfied with the bogofilter I will keep double checking with razor since it trains bogofilter.
Back to top
View user's profile Send private message
rocketchef
n00b
n00b


Joined: 08 Feb 2004
Posts: 24

PostPosted: Mon Dec 06, 2004 9:52 am    Post subject: Reply with quote

Well, you're using two-state classification with bogofilter, aren't you? I'd suggest using tri-state, so bogofilter can decide if it's ham, spam or whether it is unsure about this. You could stop processing of the mail after bogofilter has made a decent decision about it, but if bogofilter is not sure about the mail you could go on checking with razor.
I don't think bogofilter will mark a spam mail as ham in this way, it is plainly unsure about that, even with a a very small corpus of spam token. If bogofilter is unsure about the mail and razor considers it as spam you can automatically retrain bogofilter. This way you are doing some kind of "train on error" instead of "train on everything" which is supposed to lead to better classification results without bloating your token database.

I am sure you will be satisfied with your combined bogofilter - razor setup, bogofilter does a great job for me, either in terms of speed and in terms of precision. Razor on the other hand had great improvements in the >2.6 versions.
Back to top
View user's profile Send private message
alkan
Guru
Guru


Joined: 06 Aug 2004
Posts: 385
Location: kasimlar yaylasi

PostPosted: Mon Dec 06, 2004 6:25 pm    Post subject: Reply with quote

Ok I got your point, I don't want to bloat my bogofilter database.

Do you you know how to get "X-Bogosity: Unsure" header? bogofilter always returns "X-Bogosity:Yes" or "X-Bogosity: No". (sure, I can use the exit code of the bogofilter, but I'd rather do it directly with bogofilter passthrough feature so I can use it in kmail filters)
Back to top
View user's profile Send private message
rocketchef
n00b
n00b


Joined: 08 Feb 2004
Posts: 24

PostPosted: Tue Dec 07, 2004 6:39 am    Post subject: Reply with quote

I think it should be sufficient to define ham_cutoff and spam_cutoff in your ~/.bogofilter.cf.
If that does not work review the bogofilter configuration sample in your doc directory. It is well commented and helpful.
Back to top
View user's profile Send private message
alkan
Guru
Guru


Joined: 06 Aug 2004
Posts: 385
Location: kasimlar yaylasi

PostPosted: Tue Dec 07, 2004 6:21 pm    Post subject: Reply with quote

thanks, It did the trick.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Documentation, Tips & Tricks All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum