I made reference to this before in this post
and now raise the point again, but with a more direct plea.
I have two directories of e-mail. One is full of spam, the other full of ham.
I would like to iterate over them, open them up, look at only the headers, and get all of the e-mails from them (To, Cc, Bcc - From doesn't really matter... and technically Bcc doesn't matter either since it leaves that entirely empty many times).
From there I want to look at each one and see if my domain is in there, if so, pull out that user and that to an array.
All of this is trivial, or so I thought. I can do the iterating, the arrays, etc. I had figured that I would use the MailTools package to pull out only the headers and save me the work - but apparently that refuses to work for me (perhaps because this is all on an Exchange server and it doesn't format them properly? don't know).
So what I want to know is how to get just the headers of the e-mail? Without anything so elegant as using a package to do it - just straight forward, brutish and in your face ugliness of code... or something.
I thought this would be a trivial issue, but after going through hundreds of mail examples, I am seeing that the headers rarely have the same format... which makes it hard to grab specific fields from them.
It is easy to find where to start - look for "To: " and then keep grabbing things until you get to... ahhh, there's the rub. It is never (Rarely) the same in these e-mails...
So then I think, why not just the headers? I know that they start... at the beginning of the message. Good, I know where to start... then they stop... well, again, fairly arbitrarily from what I can tell looking at these hundreds of messages.
So how do I know where to stop? Seems like many programming questions boil down to this. I know how to do something, but how do I know when to stop doing it?
I hope that this is brazenly obvious and I'm a total moron for not seeing it - but for now, and perhaps it is the heat, or even the humidity, I am stumped and sweaty.
Thanks to any and all that have a response. Even better if it is helpful.
There are some odd things afoot now, in the Villa Straylight.
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.