Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Spamassassin - Bayes help needed. [SOLVED]
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Networking & Security
View previous topic :: View next topic  
Author Message
wetkitty
n00b
n00b


Joined: 26 Sep 2003
Posts: 16
Location: Baker City, OR

PostPosted: Thu Apr 07, 2005 5:57 pm    Post subject: Spamassassin - Bayes help needed. [SOLVED] Reply with quote

First the numbers
SA version 2.63
Mail server installed using Sabrex's how to:
https://forums.gentoo.org/viewtopic-t-171499-highlight-spamassassin+qmail.html
local.cf (relevant parts anyway ):
Code:
# Text to prepend to subject if rewrite_subject is used
subject_tag *****SPAM*****
report_header 1
# Encapsulate spam in an attachment
report_safe 1
add_header all Status _YESNO_, hits=_HITS_ required=_REQD_ tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
# Use terse version of the spam report
use_terse_report 0

# Enable the Bayes system
use_bayes               1
bayes_min_ham_num 200
bayes_min_spam_num 200
bayes_use_hapaxes 1
# Enable Bayes auto-learning
auto_learn              1
auto_learn_threshold_nonspam    1.0
auto_learn_threshold_spam       7.0
bayes_path      /root/.spamassassin/bayes


spamassassin -D --lint outputs the following regarding bayes:
Code:

debug: using "/usr/share/spamassassin" for default rules dir
debug: using "/etc/mail/spamassassin" for site rules dir
debug: using "/root/.spamassassin" for user state dir
debug: using "/root/.spamassassin/user_prefs" for user prefs file
debug: bayes: 23462 tie-ing to DB file R/O /root/.spamassassin/bayes_toks
debug: bayes: 23462 tie-ing to DB file R/O /root/.spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 3 chosen.
debug: Initialising learner
debug: Loading languages file...
debug: Language possibly: en,sco
debug: is Net::DNS::Resolver available? yes
debug: trying (3) amazon.de...
debug: looking up MX for 'amazon.de'
debug: MX for 'amazon.de' exists? 1
debug: MX lookup of amazon.de succeeded => Dns available (set dns_available to hardcode)
debug: is DNS available? 1
debug: all '*From' addrs: ignore@compiling.spamassassin.taint.org
debug: running header regexp tests; score so far=0
debug: running body-text per-line regexp tests; score so far=2.077
debug: bayes corpus size: nspam = 2195, nham = 2557
debug: uri tests: Done uriRE
debug: tokenize: header tokens for *F = "U*ignore D*compiling.spamassassin.taint.org D*spamassassin.taint.org D*taint.org D*org"
debug: tokenize: header tokens for *m = " 1112896121 lint_rules "
debug: bayes token 'somewhat' => 0.0919180934020199
debug: bayes: score = 0.0919180934020198
debug: bayes: 23462 untie-ing
debug: bayes: 23462 untie-ing db_toks
debug: bayes: 23462 untie-ing db_seen

and
Code:

debug: running meta tests; score so far=4.984
debug: is spam? score=3.46 required=5.5 tests=BAYES_01,DATE_MISSING,DCC_CHECK,NO_REAL_NAME


Notice that bayes is used and weighting on a rule caused a shift in the score - this is how I would like it to work for real, but notice the headers from mail processed by spamd using the same config:
Code:

X-Spam-Status: Yes, hits=33.9 required=5.5
X-Spam-Level: +++++++++++++++++++++++++++++++++
X-Spam-Report: SA TESTS
     1.1 SARE_HEAD_HDR_XSPAM Message headers used which identify spam
     2.5 MANGLED_SOMA BODY: mangled Soma
     0.6 J_CHICKENPOX_32 BODY: 3alpha-pock-2alpha
     2.3 MANGLED_PHRMCY BODY: mangled pharmacy
     2.3 MANGLED_AFFORD BODY: mangled affordable
     0.1 SAVE_UP_TO BODY: Save Up To
     2.5 MANGLED_CIALIS BODY: mangled Cialis
     2.3 MANGLED_ONLINE BODY: mangled online
     2.5 MANGLED_AMBIEN BODY: mangled ambien
     2.5 MANGLED_XANAX BODY: mangled xanax
     0.6 J_CHICKENPOX_36 BODY: 3alpha-pock-6alpha
     0.6 J_CHICKENPOX_12 BODY: 1alpha-pock-2alpha
     1.8 RAZOR2_CF_RANGE_51_100 BODY: Razor2 gives confidence between 51 and 100
     [cf: 100]
     1.1 MIME_BASE64_TEXT RAW: Message text disguised using base64 encoding
     0.9 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
     1.8 DCC_CHECK Listed in DCC (http://rhyolite.com/anti-spam/dcc/)
     0.1 RCVD_IN_NJABL RBL: Received via a relay in dnsbl.njabl.org
     [80.134.75.72 listed in dnsbl.njabl.org]
     1.5 DRUGS_ERECTILE_OBFU Obfuscated reference to an erectile drug
     1.0 DRUGS_ERECTILE Refers to an erectile drug
     1.0 DRUGS_ANXIETY_OBFU Obfuscated reference to an anxiety control drug
     0.0 DRUGS_SLEEP Refers to a sleep aid drug
     0.0 DRUGS_ANXIETY Refers to an anxiety control drug
     0.0 DRUGS_MUSCLE Refers to a muscle relaxant
     2.2 SARE_MULT_RATW_02 Spammer sign in headers
     1.0 DRUGS_ANXIETY_EREC Refers to both an erectile and an anxiety drug
     0.5 DRUGS_SLEEP_EREC Refers to both an erectile and a sleep aid drug
     1.0 DRUGS_MANYKINDS Refers to at least four kinds of drugs


No reference to bayes - ever.

So my question is:
Is the bayes being used by spamd - if so where and how, if not what needs to be done to get it working like the test run?
_________________
2x Sony VAIO FX-215's w/Stage1 installs


Last edited by wetkitty on Thu Apr 14, 2005 10:31 pm; edited 1 time in total
Back to top
View user's profile Send private message
wetkitty
n00b
n00b


Joined: 26 Sep 2003
Posts: 16
Location: Baker City, OR

PostPosted: Wed Apr 13, 2005 1:04 am    Post subject: Bump Reply with quote

Just a bump to see if there are any Spamassassin guru's out there today.
_________________
2x Sony VAIO FX-215's w/Stage1 installs
Back to top
View user's profile Send private message
giant
Tux's lil' helper
Tux's lil' helper


Joined: 01 Aug 2002
Posts: 107

PostPosted: Wed Apr 13, 2005 8:49 pm    Post subject: Reply with quote

Hmm how long is your server running ?
What kind of mail traffic are we talking about ?
Are you using some sort of autolearn for missed spam mails ?

I don't see anything wrong in your config ...

If bayes is working you should see somehting like that:

Code:

0.0 HTML_MESSAGE           BODY: HTML included in message
   3.0 HTML_IMAGE_ONLY_08     BODY: HTML: images with 400-800 bytes of words
   0.2 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
   -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
   [score: 0.0000]
   0.0 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars
   -0.2 AWL                    AWL: From: address is in the auto white-list


You are sure that you start spamd with the right /etc/conf.d/spamd settings ?
In my case I have a special user set up where I store the bayes dbs - with a new setup I am testing the mysql storage.

The owner / rights on your bayes path are correct - if spamd cannot write to those files it won't work ...

Just a couple thoughts ....
Back to top
View user's profile Send private message
wetkitty
n00b
n00b


Joined: 26 Sep 2003
Posts: 16
Location: Baker City, OR

PostPosted: Thu Apr 14, 2005 7:16 pm    Post subject: Reply with quote

Code:
Hmm how long is your server running ?

Uptime? - Can't seem to get more than six months, always ends up getting shutdown to move to a different rack or some such thing.
This particular mail config is at least a year old. After originally setting it up Spamassassn appeared to be working great - it wasn't until I started looking to improve the performance I noticed the missing Bayes.

Code:
What kind of mail traffic are we talking about ?

20,000 messages a month - expecting it to double in a few months.

Code:
Are you using some sort of autolearn for missed spam mails ?

A trusted friend who is (was) receiving lots of spam is using Thunderbird along with my IMAP server. Thunderbird drops spam into its junk folder (plus any he manually tags). A cron job runs salearn against that junk folder and restarts spamd.
That and autowhitelist and autolearn are enabled in the configs. ( I can see autowhitelist working properly )

Code:
If bayes is working you should see somehting like that:

Code:

0.0 HTML_MESSAGE           BODY: HTML included in message
   3.0 HTML_IMAGE_ONLY_08     BODY: HTML: images with 400-800 bytes of words
   0.2 MIME_HTML_ONLY         BODY: Message only has text/html MIME parts
   -2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
   [score: 0.0000]
   0.0 MIME_QP_LONG_LINE      RAW: Quoted-printable line longer than 76 chars
   -0.2 AWL                    AWL: From: address is in the auto white-list

That is exactly what I'm missing.


Code:
You are sure that you start spamd with the right /etc/conf.d/spamd settings ?


Just one option there:
Code:
# Config file for /etc/init.d/spamd

# Some options:
#
# -a for auto-white-list
# -c to create a per user configuration file
# -L if you want to suppress DNS lookup
# -u USER to run as a user other than root
#
# for more help look in man spamd

SPAMD_OPTS="-a"


Code:
In my case I have a special user set up where I store the bayes dbs - with a new setup I am testing the mysql storage.

The owner / rights on your bayes path are correct - if spamd cannot write to those files it won't work ...

Well, it uses the dbs when running 'spamassassin -D --lint', but it also scores using bayes when running 'spamassassin -D --lint'. There is something different between running 'spamassassin -D --lint' and '/etc/init.d/spamd start'. After reading your post I'm going to start looking for a user or permission difference between the two.

Any other thoughts will be appreciated - I'll post back with any success (or failure)

Thanks
_________________
2x Sony VAIO FX-215's w/Stage1 installs
Back to top
View user's profile Send private message
Ateo
Advocate
Advocate


Joined: 02 Jun 2003
Posts: 2022
Location: Vegas Baby!

PostPosted: Thu Apr 14, 2005 8:07 pm    Post subject: Reply with quote

Are the parameters [for local.cf] bayes_auto_learn_threshold_nonspam and/or bayes_auto_learn_threshold_spam an option in SA 2.63? If so, you probably want to set those.
Back to top
View user's profile Send private message
wetkitty
n00b
n00b


Joined: 26 Sep 2003
Posts: 16
Location: Baker City, OR

PostPosted: Thu Apr 14, 2005 10:29 pm    Post subject: Solved Reply with quote

Code:
Are the parameters [for local.cf] bayes_auto_learn_threshold_nonspam and/or bayes_auto_learn_threshold_spam an option in SA 2.63? If so, you probably want to set those.

Yes, those are set to 1 and 7 respectively.

I did find a solution though! And it is related to permissions. I added debugging (-D) to the /etc/conf.d/spamd and after reviewing the logs found that it was unable to read /root/.spamassassin.

It would seem that running 'spamassassin -D --lint' as root works becuase it is always root, however the actual running version drops to a lower privileged user after starting. So changing the file permissions to
Code:
drwxrwx---   2 root qscand   176 Apr 14 14:55 .spamassassin

solved everything.

I would be curious to know if there are any security issues with having those permissions though?
_________________
2x Sony VAIO FX-215's w/Stage1 installs
Back to top
View user's profile Send private message
FastTurtle
Guru
Guru


Joined: 03 Sep 2002
Posts: 500
Location: Flakey Shake & Bake Caliornia, USA

PostPosted: Thu Apr 14, 2005 11:20 pm    Post subject: Reply with quote

I'm thinking that you may want to look into giving SA ownership of the file instead of changing perms as it should be safer
_________________
AsRock B550 Phantom Gaming 4
128GB 3200 Mhz memory
1TB NVME as the boot disk
4x 4TB Sata - 2x 2TB Sata SSD - 4x 450GB SaS - 3x 900GB SaS - 72GB SaS for Gentoo system disk
LSI 9300-16i in HBA mode for all spinning disks
Radeon 6800 (Non XT) for GPU
Back to top
View user's profile Send private message
Ateo
Advocate
Advocate


Joined: 02 Jun 2003
Posts: 2022
Location: Vegas Baby!

PostPosted: Thu Apr 14, 2005 11:49 pm    Post subject: Re: Solved Reply with quote

wetkitty wrote:
Code:
Are the parameters [for local.cf] bayes_auto_learn_threshold_nonspam and/or bayes_auto_learn_threshold_spam an option in SA 2.63? If so, you probably want to set those.

Yes, those are set to 1 and 7 respectively.

I did find a solution though! And it is related to permissions. I added debugging (-D) to the /etc/conf.d/spamd and after reviewing the logs found that it was unable to read /root/.spamassassin.

It would seem that running 'spamassassin -D --lint' as root works becuase it is always root, however the actual running version drops to a lower privileged user after starting. So changing the file permissions to
Code:
drwxrwx---   2 root qscand   176 Apr 14 14:55 .spamassassin

solved everything.

I would be curious to know if there are any security issues with having those permissions though?


What were the permissions before hand? Giving write permissions to the group is something I, personally, try to avoid.
Back to top
View user's profile Send private message
giant
Tux's lil' helper
Tux's lil' helper


Joined: 01 Aug 2002
Posts: 107

PostPosted: Fri Apr 15, 2005 7:51 am    Post subject: Reply with quote

Glad it worked :-)

Just to wrap this up. This is my conf:


Code:

cat /etc/conf.d/spamd
# Config file for /etc/init.d/spamd

SPAMD_OPTS="-x -u spamd  -H /home/spamd"


Disables per User config and runs with user spamd and stored everything under /home/spamd

Which looks like this then

Code:

spamd # ls -l /home/spamd/
total 9288
-rw-------  1 spamd spamd   38304 Apr 15 09:57 bayes_journal
-rw-------  1 spamd spamd 5210112 Apr 15 09:57 bayes_seen
-rw-------  1 spamd spamd 5210112 Apr 15 09:57 bayes_toks


This is the production server. On my test sever I am testing a setup using mysql. Or better a combination of amavisd with Maia Mailguard.
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Networking & Security All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum