Spamassassin
Back
SpamAssassin is a mail filter to identify spam. It is an intelligent email filter which uses a diverse range of tests to identify unsolicited bulk email, more commonly known as Spam. These tests are applied to email headers and content to classify email using advanced statistical methods. In addition, SpamAssassin has a modular architecture that allows other technologies to be quickly wielded against spam and is designed for easy integration into virtually any email system.
This page is designed to give you an overview of how QmailToaster goes about configuring SpamAssassin.
Configuration and Rules
The SpamAssassin-Toaster uses the following configuration files:
- /etc/mail/spamassassin/local.cf
- /etc/mail/spamassassin/v310.pre
- /etc/mail/spamassassin/v312.pre
- /usr/share/spamassassin/*.cf
The local.cf file contains basic settings, like the score you must reach before a message is considered spam, what the subject line should be changed to if the score is reached (ie add ***SPAM*** to the subject) and whether Bayes Scoring should be used. The settings in here will apply to all users on your system.
The two .pre files tell SpamAssassin what plugins to load for applying different tests. these are in the format
loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
You can find a list of available plugins on CPAN. Installing a plugin using CPAN goes like this:
# cpan # install Mail::SpamAssassin::Plugin::URIDNSBL # quit
Here's how to find out what perl modules you have. If you are using the latest version of SpamAssassin-toaster then everything you need should already be installed.
The /usr/share/spamassassin/*.cf files are custom rule sets designed for catching spam using your installed modules. How each of them will add (or subtract) points from the mail's spam score is set by 50_scores.cf. If you are, for instance, a pharmaceutical retailer you probably want to lower the scores for the various drugs cf files.
Some of the files will only be used if the appropriate module is loaded, for instance 25_uribl.cf will only run if you have added
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
to one of your .pre files.
You can find lots and lots of alternative rule sets at Rules Emporiumand you might want to join a SpamAssassin mailing list to keep your self up to date on the fight against spam while you are at it.
If you add rules (by creating a new .cf file in /usr/share/spamassassin) or add a module to a .pre file so new rules will be applied or basically make any changes to the SpamAssassin configuration files you must check that all the syntax is OK:
# spamassassin -D --lint
If you see any errors, correct them before you restart the spamd service! The most likely thing you will see are missing perl modules. Add them using CPAN as you see above.
After you make any changes you need to restart the SpamAssassin service. You can do this using Jake's spamd script or by doing:
# qmailctl stop # qmailctl start
Bayesian Statistical Scoring
SpamAssassin can score messages based on the words in a message because certain words are more probable to turn up in spam and others are more probable to show up in ham.
In order for this to be effective you need to train Spam Assassin. You will need a collection of spam messages and a collection of ham messages. You can do this by setting up a couple of email accounts on your server called spam@yourqmailtoaster.com and notspam@yourqmailtoaster.com. Forward all your spam mail to one and non-spam mail to the other, alright you might not want to forward all of your real mail to it but the more ham Spam Assassin has for comparison, the better. You should encourage your users to forward spam to the spam address and any false positives to the not-spam address. You might want to implement Squirrelmail Spam Buttons to make this easier.
Now create a script that looks like this:
#!/bin/bash # Spam Assassin Bayes Training
# Learn spam! DOMAIN=yourdomain SPAM=your-spam-address HAM=your-ham-address
cd /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/cur /usr/bin/sa-learn --spam ./* rm -rf /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/cur/*
cd /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/new /usr/bin/sa-learn --spam ./* rm -rf /home/vpopmail/domains/$DOMAIN/$SPAM/Maildir/new/*
# Learn ham! cd /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/cur /usr/bin/sa-learn --ham ./* rm -rf /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/cur/* cd /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/new /usr/bin/sa-learn --ham ./* rm -rf /home/vpopmail/domains/$DOMAIN/$HAM/Maildir/new/*
Test it and use cron to run the script daily.
NOTE: just to belabor the point a bit more, this script deletes all the mail in the ham and spam directories. Do not just run this on your own inbox!
Use bounce a message to feed SA bayes
Jack Vickers' info per 2 Aug 2007:
Having the users forward the messages to an account like that is "a bad thing to do" according to the guys on the spamassassin mailing list. You need to bounce the messages to those address, not forward. By forwarding, programs like Outlook rewrite the headers, so your Bayes thinks that the spam messages are being sent from the user that is sending them.
How to bounce/redirect mail How to redirect/bounce mail for sa-learn
Further Info
- SimScan is used by QMailToaster to run incoming mail through ClamAV and SpamAssassin. It is configured by the settings in /var/qmail/control/simcontrol. See Simscan for more details.
- The SpamAssassin daemon is started by the /var/qmail/supervise/spamd/run script. man spamd for other options you can set in here.
- SpamAssassin can be set up to check the body of messages against Spam URI Realtime Blocklists. See SURBL for more details.
- You can also check incoming mail against Realtime Black Lists before the mail even reaches SpamAssassin. See RBLs for more details.
How to reset Spam Assassin Bayes Training
You can do any of the following commands:
# su vpopmail -c 'sa-learn --clear'
if you don't give user vpopmail a valid shell is:
# sudo -H -u vpopmail sa-learn --clear
User Sumbitted Scripts
Multi-domain sa-learn ham/spam script
Enhancement of the "Spam Assassin Bayes Training" script above. But use at your own risk :)
#!/bin/bash ## ## Spam Assassin Bayes Training ## testrun=0 ## set to 1 to begin real-life use cd /home/vpopmail/domains/ for i in * do echo -en "DOMAIN:\t$i\t" pre=$(echo $i | /bin/sed s/'\.'//g) spampre="$pre-spam" ## test.com ==> testcom-spam@test.com hampre="$pre-ham" ## test.com ==> testcom-ham@test.com ## ## Process SPAM for the current domain ## echo -en "\tS: " if [ -d $i/$spampre ] then spamcount=0; cd $i; cd $spampre; cd Maildir; cd cur for spam in $(/usr/bin/find ./ -type f) do let spamcount=$spamcount 1 if [ $testrun -eq 0 ] then /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --spam $spam 1>/dev/null rm -f $spam fi done cd .. cd new for spam in $(/usr/bin/find ./ -type f) do let spamcount=$spamcount 1 if [ $testrun -eq 0 ] then /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --spam $spam 1>/dev/null rm -f $spam fi done cd ..; cd ..; cd ..; cd .. echo -en $spamcount else echo -en "NA" fi ## ## Process HAM for the current domain ## echo -en "\tS: " if [ -d $i/$hampre ] then hamcount=0; cd $i; cd $hampre; cd Maildir; cd cur for ham in $(/usr/bin/find ./ -type f) do let hamcount=$hamcount 1 if [ $testrun -eq 0 ] then /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --ham $ham 1>/dev/null rm -f $ham fi done cd ..; cd new; for ham in $(/usr/bin/find ./ -type f) do let hamcount=$hamcount 1 if [ $testrun -eq 0 ] then /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --ham $ham 1>/dev/null rm -f $ham fi done cd ..; cd ..; cd ..; cd .. echo $hamcount else echo "NA" fi done ## ## Update the Bayes DB ## if [ $testrun -eq 0 ] then /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn --sync /usr/bin/sudo -u vpopmail -H /usr/bin/sa-learn -u vpopmail --force-expire 1>/dev/null fi