Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^4: Subsetting text files containing e-mails

by PeterCap (Initiate)
on Jan 27, 2012 at 08:26 UTC ( #950281=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Subsetting text files containing e-mails
in thread Subsetting text files containing e-mails

Aha! I get it. So essentially when a paragraph is found that contains '^From:' it places a marker at the beginning of that paragraph?

I could not figure out how it was handling all the blank lines within the e-mails until I realized that it wasn't and didn't need to.

Just to be clear, in order to actually subset the file I would still need to close and reopen it, right? I'm thinking something like:

open (<MYDATA>, $filein); while (<MYDATA>) { if (/^---- Email 1/ ... /---- Email2/) { open (<MYOUTPUT>, ">$fileout"); print MYOUTPUT $_; close (MYOUTPUT); } } close (MYDATA);

I suppose I might create a loop so that a new value for the search terms (i.e., /^---- Email 2/ ... /^---- Email 3/ for the second iteration, etc.) is selected as well as a new output file to catch the results...


Comment on Re^4: Subsetting text files containing e-mails
Select or Download Code
Replies are listed 'Best First'.
Re^5: Subsetting text files containing e-mails
by GrandFather (Sage) on Jan 27, 2012 at 09:17 UTC

    You don't need more than one pass through the source file. Just create the output files as you need them. In sketch you'd have something like:

    use strict; use warnings; my $emailNum; my $outFile; $/ = ''; # Set readline to "Paragraph mode" while (<DATA>) { if (!$emailNum || /^From:/im) { close $outFile if $outFile; my $fname = sprintf "mails_%06d.txt", ++$emailNum; open $outFile, '>', $fname or die "Can't create $fname: $!\n"; } print $outFile $_; }
    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950281]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (13)
As of 2015-07-28 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls