in reply to Subsetting text files containing e-mails
Okay, and the first thing I would try is to go to http://search.cpan.org and type in mailbox and pore through all of the 311 hits thereby produced.
I am going to assume that these files are probably in some kind of standard “mailbox” format; certainly, the messages themselves are. Therefore, I am going to be acting on the assumption that I am dealing with a well-known task that someone else has already thoroughly solved for me, either in part or (more likely) altogether. Thoughts of having to waste my own time niggling with regular-expressions, simply are not going to enter set of initial project design assumptions. I am going to plan to spend very little time writing and a lot of time looking.