Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Spam filter my mbox

by toadi (Chaplain)
on Aug 31, 2004 at 14:17 UTC ( #387194=perlquestion: print w/ replies, xml ) Need Help??
toadi has asked for the wisdom of the Perl Monks concerning the following question:

hello,

I have spamassasin running on my mailspool. It works for some spam. But over a year I have gathered +5000 extra spams. They are no longer on the mailspool but in mbox format on that server. I want to move the SPAM's from the inbox to the spam file(both in mbox format).

I've seen some usefull modules. But for the moment these are lengthy solution for a script I'm only going to use once.



--
My opinions may have changed,
but not the fact that I am right

Comment on Spam filter my mbox
Re: Spam filter my mbox
by xorl (Deacon) on Aug 31, 2004 at 14:47 UTC
    Are the Spams marked as spam? If so how? Can't you use your email program to identify the tag and select all tagged messages? You probably don't even need Perl for this. If you need to use Perl, I'd open the mbox file, seperate all the messages, scan each message, for messages with the tag append them to the other file and delete them from the mbox file.

    If they're not already marked, good luck I hope someone else around here can help 'cause I can't come close to getting an automated solution to identify spam.

      Like I said they are missed by my spamfilter, so they are not marked. But I found anther solution. Open the mbox in thunderbird and sort them :)


      --
      My opinions may have changed,
      but not the fact that I am right

        Once you select what is spam and what's not, and you have two new mboxes, you'd better off doing a tour on them with sa-learn from the SpamAssassin distribution. You'll see that your spamassassin will have far less false positives

        From the sa-learn man page:

        NAME sa-learn - train SpamAssassin's Bayesian classifier SYNOPSIS sa-learn [options] [file]... [...] Options: --ham Learn messages as ham (non-s +pam) --spam Learn messages as spam [...] --mbox Input sources are in mbox fo +rmat --showdots Show progress using dots --no-rebuild Skip building databases afte +r scan [...] DESCRIPTION Given a typical selection of your incoming mail classified as spam or ham (non-spam), this tool will feed each mail to SpamAssassin, allowing it to 'learn' what signs are likely to mean spam, and which are likely to mean ham. Simply run this command once for each of your mail fold­ ers, and it will ''learn'' from the mail therein. [...] SpamAssassin remembers which mail messages it's learnt already, and will not re-learn those messages again, unless you use the --forget option. Messages learnt as spam will have SpamAssassin markup removed, on the fly.

        Ciao!
        --bronto


        The very nature of Perl to be like natural language--inconsistant and full of dwim and special cases--makes it impossible to know it all without simply memorizing the documentation (which is not complete or totally correct anyway).
        --John M. Dlugosz

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://387194]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2014-07-11 01:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (217 votes), past polls