Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^4: Subsetting text files containing e-mails

by PeterCap (Initiate)
on Jan 27, 2012 at 08:26 UTC ( #950281=note: print w/replies, xml ) Need Help??

in reply to Re^3: Subsetting text files containing e-mails
in thread Subsetting text files containing e-mails

Aha! I get it. So essentially when a paragraph is found that contains '^From:' it places a marker at the beginning of that paragraph?

I could not figure out how it was handling all the blank lines within the e-mails until I realized that it wasn't and didn't need to.

Just to be clear, in order to actually subset the file I would still need to close and reopen it, right? I'm thinking something like:

open (<MYDATA>, $filein); while (<MYDATA>) { if (/^---- Email 1/ ... /---- Email2/) { open (<MYOUTPUT>, ">$fileout"); print MYOUTPUT $_; close (MYOUTPUT); } } close (MYDATA);

I suppose I might create a loop so that a new value for the search terms (i.e., /^---- Email 2/ ... /^---- Email 3/ for the second iteration, etc.) is selected as well as a new output file to catch the results...

Replies are listed 'Best First'.
Re^5: Subsetting text files containing e-mails
by GrandFather (Sage) on Jan 27, 2012 at 09:17 UTC

    You don't need more than one pass through the source file. Just create the output files as you need them. In sketch you'd have something like:

    use strict; use warnings; my $emailNum; my $outFile; $/ = ''; # Set readline to "Paragraph mode" while (<DATA>) { if (!$emailNum || /^From:/im) { close $outFile if $outFile; my $fname = sprintf "mails_%06d.txt", ++$emailNum; open $outFile, '>', $fname or die "Can't create $fname: $!\n"; } print $outFile $_; }
    True laziness is hard work

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950281]
and a kettle whistles...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (7)
As of 2017-05-24 10:48 GMT
Find Nodes?
    Voting Booth?