Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^4: Subsetting text files containing e-mails

by PeterCap (Initiate)
on Jan 27, 2012 at 08:26 UTC ( #950281=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Subsetting text files containing e-mails
in thread Subsetting text files containing e-mails

Aha! I get it. So essentially when a paragraph is found that contains '^From:' it places a marker at the beginning of that paragraph?

I could not figure out how it was handling all the blank lines within the e-mails until I realized that it wasn't and didn't need to.

Just to be clear, in order to actually subset the file I would still need to close and reopen it, right? I'm thinking something like:

open (<MYDATA>, $filein); while (<MYDATA>) { if (/^---- Email 1/ ... /---- Email2/) { open (<MYOUTPUT>, ">$fileout"); print MYOUTPUT $_; close (MYOUTPUT); } } close (MYDATA);

I suppose I might create a loop so that a new value for the search terms (i.e., /^---- Email 2/ ... /^---- Email 3/ for the second iteration, etc.) is selected as well as a new output file to catch the results...


Comment on Re^4: Subsetting text files containing e-mails
Select or Download Code
Re^5: Subsetting text files containing e-mails
by GrandFather (Cardinal) on Jan 27, 2012 at 09:17 UTC

    You don't need more than one pass through the source file. Just create the output files as you need them. In sketch you'd have something like:

    use strict; use warnings; my $emailNum; my $outFile; $/ = ''; # Set readline to "Paragraph mode" while (<DATA>) { if (!$emailNum || /^From:/im) { close $outFile if $outFile; my $fname = sprintf "mails_%06d.txt", ++$emailNum; open $outFile, '>', $fname or die "Can't create $fname: $!\n"; } print $outFile $_; }
    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950281]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (9)
As of 2014-07-30 03:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls