AssFace has asked for the wisdom of the Perl Monks concerning the following question:

I have recently hacked at Exchange 2000/2003 to make SpamAssassin work with it via an EventSink, obviously on a Win32 platform.

Now that I have it working, I want to run some stats on the amount of spam that each user is getting.

The code I have written to run the stats all works... sort of - I narrowed down my issue to the issue being with the Mail::Internet code. I looked around on PerlMonks as well as on O'Reilly (http://perl.oreilly.com/news/perladmin_0700.html) and found various usage examples.

When I try the code that I have below as an example (in "readmore" tags), I just get no response at all. No errors, but no output either (if I add in print statements in there just to see if they are getting hit, they do - just nothing that is related to the Mail::Internet. And yes, there are definitely a lot of files in that directory that it is trying to read, and it is seeing them.).

I am on Win2K SP3, running ActiveState Perl (v5.6.1). If I run PPM and do a query Mail, I get:
Email-Find (0.09) Find RFC 822 email addresses in plain text
Email-Valid (0.14) Check validity of Internet email addresses
MailTools (1.58) Various Mail related modules


are there known issues with MailTools under Win32? or am I doing anything obviously wrong in the code below?
use strict; use Mail::Internet; use Mail::Header; #variable declaration my $strSpamDir = 'D:/spam/SPAM_FINAL'; my $file; my $mail; opendir SPAM_DIR, $strSpamDir or die "could not opendir $strSpamDir:$! +\n"; foreach $file (readdir SPAM_DIR){ if($file ne '.' && $file ne '..'){ open(MY_FILE, "$strSpamDir/$file") or die "could not open the +file $strSpamDir/$file:$!\n"; $mail = Mail::Internet->new(\*MY_FILE); $mail->print_header(); close(MY_FILE) or die "could not close $strSpamDir/$file:$!\n" +; } } closedir SPAM_DIR;


-------------------------------------------------------------------
There are some odd things afoot now, in the Villa Straylight.

Replies are listed 'Best First'.
Re: Mail::Internet on Win32?
by BrowserUk (Pope) on Jun 27, 2003 at 15:54 UTC

    Try using Data::Dumper on the return from new().

    print Dumper $mail; to see if the is anything useful in there.

    I took a quick look at the source, and it would appear that new() returns a valid handle regardless of whether it managed to read anything or not. In fact, from my cursory glance, I couldn't see much in the way of error handling/reporting at all, but it maybe that this is buried further down in the hierachy of modules.

    You'd think that you would get some indication of whether it failed to read the from file handle supplied. I couldn't see anything mentioned about what happens in the event of failure nor how to test for same?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


      I added that line and sure enough, it seems to only be populating the 'mail_inet_body', but then that has everything in it - headers and the body.

      Judging from a comment I see here below this, perhaps Exchange doesn't format e-mail properly for this to read it, or perhaps my code in the way that it outputs it has somehow messed it up?

      -------------------------------------------------------------------
      There are some odd things afoot now, in the Villa Straylight.

        Basically what you are saying is that an MS product doesn't store it's data in the accepted internet standard format (RFC822).

        Um...Whatta surprise:)

        You might want to take a look at Win32::Exchange and Win32::Exchange::Mailbox. They maybe more in tune with your needs.

        Best of luck.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller


Re: Mail::Internet on Win32?
by AssFace (Pilgrim) on Jun 27, 2003 at 14:13 UTC
    In case it wasn't clear above, each file that it should be trying to load is the e-mail from Exchange dumped out into a file (one e-mail per file).

    And if anyone happens to want the EventSink to try out on their Exchange 2000/2003 server, I have it up at http://www.cardboardutopia.com/ExchangeSpamFilter.zip
    In the readme for that, it explains that it is not an ideal solution (and it goes into why), especially for a server that has many users getting e-mail. For small offices like ours it will likely work just fine. It also gives a link to a howto on getting SpamAssassin to "work" under a Win32 environment (spamd doesn't work and the rbl checks should be disabled since they don't work under Win32).

    -------------------------------------------------------------------
    There are some odd things afoot now, in the Villa Straylight.
Re: Mail::Internet on Win32?
by Thelonius (Priest) on Jun 27, 2003 at 15:55 UTC
    I tried it on Windows 2000. It worked fine for a valid email (RFC822 format) file. On an invalid file, it just printed an empty string. There's no real error-checking in Mail::Header.
      So from the sounds of this, and from my above comment that "print Dumper $mail;" results in showing only the 'mail_inet_body' is getting populated - with both the headers and the body... I'm guessing that my file is not formatted properly.

      So that leads me to ask if that is a factor of Exchange, or a factor of my code?

      I have an EventSink setup to save out every incoming message to a temp file, and then run SpamAssassin on that file, outputting a temp file that says whether it is spam or not.
      That output file is then scanned to see if the mail is spam or not, and is then marked accordingly and sent on to the user, as well as saving out the orig copy of the mail to either a SPAM or NONSPAM directory.

      I'm not sure what I might have done to broken the format - but it sounds like the issue is that it doesn't like the format the file is in.

      So perhaps the secondary question is if there is perhaps a different/better/easier way to do this?
      I have two folders full of mail, one spam, one ham. I want to iterate over that folder and open each message - initially I was using Email::Find to scan the entire message for e-mail addresses and pulling out the ones that had our domain in them. That way I can say user XYZ gets 35 spam in a day, and 245 ham.
      But this doesn't work if there is a long chain of reponses to an e-mail that is documented in the body of the message - then the same user's address might show up many times in the same message. So user XYZ might look like they get 30 ham messages a day, when in reality they were only getting one message with many back and forth responses that built up a tree of addresses in the body and that triggered their score.

      So what I want is only to look at the To, Cc, and the Bcc fields of the headers and look for users there.
      I figured this was the easiest method, but perhaps there is one that is better?

      Or is there a way I can find out how my message is no longer formatted properly to the RFC822 format? Since I am on a Windows platform, does saving it out (as a txt file) then add in \r with the \n and therefore break it?

      -------------------------------------------------------------------
      There are some odd things afoot now, in the Villa Straylight.