http://www.perlmonks.org?node_id=190965


in reply to Bayesian Filtering for Spam

Hi, I'm one of the SpamAssassin developers.

Yes, Bayesian filtering has been tested, and in fact works reasonably well, but does not generalise to a product like SpamAssassin does - it always requires training to the user's corpus. SpamAssassin's main market is gateway scanning, and as such we can't just ship out a bayesian classifier and expect it to "just work" like we do the current ruleset. It has to be trained to one individual's type of email.

Also I think Paul get's very different spam to what we see in the project. I've got a bayesian classifier plugin for SpamAssassin - it's part of MessageLabs' proprietary extensions to SpamAssassin. But the bonus of it is that we can tune it for our customers because we're an ISP. However, even given that tuning, we're not seeing anywhere near the accuracy that Paul is seeing. Simply because our users has vastly different email corpuses to Paul.

This system probably works great for Geeks though.

Another thing to note is that I believe this is the training system that Apple Jaguar's new Mail.app is using. It seems to be working reasonably well for me so far too.