in reply to A Beginner's Guide to Using Mail::Audit and Mail::SpamAssassin
I would like to add a couple of things:
- Mail::Audit 2.0 is broke. Sooner or later, your inbox will become corrupted. 1.11 is stable, and has given me no problems, however myself and two friends have had to back down from 2.0 to 1.11 to solve the inbox corruption problem. You can find Mail-Audit-1.11.tar.gz here (directory) or here (tarball).
- The tutorial doesn't cover installing the Razor clients. These are necessary if you wish to make use of the Vipul database. This is the coolest part of Spam::Assassin, IMHO. A MD5 checksum of the mail is compared against a database of known spam. If it matches, it's automatically tossed. More importantly, as you get spam, you can cause it to be added to the database, which means other people never have to see it. The Razor::Clients package is not on CPAN, but is available here. Spam::Assassin automatically makes use of them if they are installed, otherwise it doesn't bother to mention it.
- It is worth noting that when you are writing filters, once $item->accept() is called, the program ends. No further tests are included. The documentation says this, but it's not obvious at first glance. As such, while the subs in the example never return, it looks a little funky if you know this.
- You can use the .procmailrc file, or, you can use the .forward file with the format | ~/mailscanner.pl Note that under certain systems, such as Redhat, sendmail runs programs under the rsh shell. To make this play, you have to put a symlink in /etc/smrsh to 'mailscanner.pl', or whatever you called your client. If you get a lot of mail, it avoids the small amount of additional overhead of spooling up procmail, only to pass it on.
- This is a perl script. As such, when you make a change, you HAVE to 'perl -c mailscanner.pl' before walking away. If the scripts croaks, the MTA will send a reply to the originator of the email that the mail was undeliverable. When I was using procmail, a borked recipe was annoying, but not a problem. With Spam::Assassin, it's much more important to get it right.
- It's important to put spam in a folder, and not drop it completely. Spam::Assassin isn't perfect, nor will your rules be. Mine are tuned pretty well, and rarely lets real spam through, but sometimes it kicks out good messages, because someone set a priority flag in Outlook, and had a few caps in the title. I get mail from a guy in Romania for product support on a C compiler that causes the problem. Frequently, I run tail -f ~/.audit_log in window somewhere, and keep an eye on what's rejecting as spam. As I see mail from people that I know I'll get again, I adjust the script, or easier, tune the .spamassassin.cf whitelist and blacklist (this files gets created automatically in your home directory the first time Spam::Assassin is run.)
- There is an unsaid implication that the Vipul database will catch viruses. This may be the case for some, but it passed a Sircam laden message right on through. I scan the headers for the standard 'Snow White - The Real Story!' and a couple of others. Don't count on Spam::Assassin to protect you. Add your own countermeasures, and use standard anti-viral techniques, especially if you're going to be POP3/IMAP'ing the mail down to a Windows box.
- procmail has a facility to check if the mail is of a certain size. This is something that's lacking in this package. Each line of the message is an array entry. If you want to know how long it is, you have to interate over the entire array, summing the length. This ought to be something the package provides as a method. I'm not sure what the implications of binary messages, attachements, etc are, so unlike my procmail recipes, I don't check for files of certain sizes.
- After Spam::Assassin defangs mail (or rewrites the headers with the word SPAM everywhere), it is not clear at all if a message modified this way can or should be submitted to the Vipul database. I have found no clear answer on this, although I have not pursued it agressively. My personal policy is to only forward raw un-rewritten mails to Vipul, to make sure the MD5 checksum is for something people will actually get, and not a post-processed version. If someone knows the real answer, I'd like to know.
I think that's all the major points of running this. It's a great system, and it has seriously cut back on the crap I see.