in reply to Re: Re: From a SpamAssassin developer in thread Bayesian Filtering for Spam
Don't be too surprised that Paul's solution's not a good general-purpose one. His data set's probably quite small, with good locality, and odds are he made sure to skew his results to his data. It's not that his methods are bad for his needs, just that his needs are rather different than most people's.
Re: Re: Re: Re: From a SpamAssassin developer
by Matts (Deacon) on Aug 19, 2002 at 16:24 UTC
|
I'm not surprised. Not even slightly - see my original post.
The biggest thing about statistical analysis is you simply cannot test it on the training data set. I get 100% accuracy when I do that. And it's not surprising. I'm speculating that's what PG did. But I could be wrong. And also the fact that the training often overfits. None of this is news to anyone versed in machine learning (which I'm starting to be ;-)
Matt. | [reply] |
|