http://www.perlmonks.org?node_id=950281


in reply to Re^3: Subsetting text files containing e-mails
in thread Subsetting text files containing e-mails

Aha! I get it. So essentially when a paragraph is found that contains '^From:' it places a marker at the beginning of that paragraph?

I could not figure out how it was handling all the blank lines within the e-mails until I realized that it wasn't and didn't need to.

Just to be clear, in order to actually subset the file I would still need to close and reopen it, right? I'm thinking something like:

open (<MYDATA>, $filein); while (<MYDATA>) { if (/^---- Email 1/ ... /---- Email2/) { open (<MYOUTPUT>, ">$fileout"); print MYOUTPUT $_; close (MYOUTPUT); } } close (MYDATA);

I suppose I might create a loop so that a new value for the search terms (i.e., /^---- Email 2/ ... /^---- Email 3/ for the second iteration, etc.) is selected as well as a new output file to catch the results...

Replies are listed 'Best First'.
Re^5: Subsetting text files containing e-mails
by GrandFather (Saint) on Jan 27, 2012 at 09:17 UTC

    You don't need more than one pass through the source file. Just create the output files as you need them. In sketch you'd have something like:

    use strict; use warnings; my $emailNum; my $outFile; $/ = ''; # Set readline to "Paragraph mode" while (<DATA>) { if (!$emailNum || /^From:/im) { close $outFile if $outFile; my $fname = sprintf "mails_%06d.txt", ++$emailNum; open $outFile, '>', $fname or die "Can't create $fname: $!\n"; } print $outFile $_; }
    True laziness is hard work