Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Re: Re: From a SpamAssassin developer

by Elian (Parson)
on Aug 19, 2002 at 07:11 UTC ( [id://191100]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: From a SpamAssassin developer
in thread Bayesian Filtering for Spam

Don't be too surprised that Paul's solution's not a good general-purpose one. His data set's probably quite small, with good locality, and odds are he made sure to skew his results to his data. It's not that his methods are bad for his needs, just that his needs are rather different than most people's.
  • Comment on Re: Re: Re: From a SpamAssassin developer

Replies are listed 'Best First'.
Re: Re: Re: Re: From a SpamAssassin developer
by Matts (Deacon) on Aug 19, 2002 at 16:24 UTC
    I'm not surprised. Not even slightly - see my original post.

    The biggest thing about statistical analysis is you simply cannot test it on the training data set. I get 100% accuracy when I do that. And it's not surprising. I'm speculating that's what PG did. But I could be wrong. And also the fact that the training often overfits. None of this is news to anyone versed in machine learning (which I'm starting to be ;-)

    Matt.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://191100]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2024-04-24 03:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found