Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Text File Section Extractor

by BJ_Covert_Action (Beadle)
on Apr 09, 2009 at 21:55 UTC ( #756715=CUFP: print w/replies, xml ) Need Help??

Hello All,

I've spent the last couple hours developing a small utility for my personal use at home and work. Basically, it is program that asks for a file to read from, a file to write to, and two strings containing patterns. Optionally, the file to be read can be taken as the first command line argument while the file to output to can be taken as the second command line argument. If only one argument is given, it is assumed to be the input file.

Anyways, the program calls a subroutine called pry to move through the input file (file to be read) line by line. It looks for the first pattern specified, and, upon finding it, prints each consecutive line to the output file specified until the second pattern is found.

I intend to upgrade and tweak on this utility as life goes on, but for now it's a decent, bare-bones utility. I am probably going to add the option to add case insensitivity to the start and stop patterns, as well as the option to specify how many sections within a text file the program should extract (for instance, if it should match all groups that it finds, or just some, or just the first, or last, and so on.

I am probably also going to try to turn it into a module as well so that I can just load it into my other programs and pry chunks out of text files as I see fit. Of course, I don't know much about module development yet (only on chapter 7 of the camel book and chapter 7 of Simon Cozens book), so that is for a later time.

I imagine someone probably already has a whole package and/or utility that does this or something similar already. It is probably a lot more professional looking as well. Nonetheless, this is a fun little program I wrote just to see if I could, and to test my skills thus far. If anyone else could/would find it useful, I want them to have access to it as well. I welcome any and all criticisms and thoughts.

Without further ado, I give you the PINSS (PINSS Is Not a Sentient Searcher) program.

#!/usr/bin/perl # PINSS.plx # Short for "Pins Is Not a Sentient Searcher" use strict; use warnings; sub pry; my $file_in; my $file_out; print "Please input the phrase or perl regular expression you want to +use to begin capturing data: "; chomp(my $start_exp = <STDIN>); print "Please input the phase or perl regular expression you want to u +se to cessate capture of data: "; chomp(my $stop_exp = <STDIN>); if(scalar @ARGV > 1){ $file_in = shift @ARGV; $file_out = shift @ARGV; print "\nINPUT FILE: $file_in\n"; print "OUTPUT FILE: $file_out\n"; pry($file_in, $file_out, $start_exp, $stop_exp); print "\n\nSee $file_out for results\n\n"; }elsif(scalar @ARGV == 1){ $file_in = shift; print "Please specify an output file to print to (type 'screen' to + print to terminal screen): "; chomp($file_out = <STDIN>); print "\nINPUT FILE: $file_in\n"; print "OUTPUT FILE: $file_out\n"; pry($file_in, $file_out, $start_exp, $stop_exp); print "\n\nSee $file_out for results\n\n"; }else{ print "Please specify an input file to read from: "; chomp($file_in = <STDIN>); print "Please specify an output file to print to (type 'screen' to + print to terminal screen): "; chomp($file_out = <STDIN>); print "\nINPUT FILE: $file_in\n"; print "OUTPUT FILE: $file_out\n"; pry($file_in, $file_out, $start_exp, $stop_exp); print "\n\nSee $file_out for results\n\n"; } sub pry(){ my $in_file = shift; my $out_file = shift; my $start = shift; my $stop = shift; # print "$in_file\n$out_file\n"; my $flag; open INFILE, $in_file or die "Cannot open input file to read fro +m: $!"; if($out_file =~ m/screen/i){ *OUTFILE = *STDOUT; }else{ open OUTFILE, ">$out_file" or die "Cannot open output file to +read from: $!"; } while(my $line = <INFILE>){ chomp $line; next if $line =~ m/^\s*$/; # Skip all blank and whitespace o +nly lines if ($flag){ if($line =~ m/$stop/){ $flag = 0; print OUTFILE "CAPTURE ENDED AT: $line\n"; next; } print OUTFILE "$line\n"; } if($line =~ m/$start/){ $flag = 1; print OUTFILE "\nCAPTURE STARTED AT: $line\n" } } close INFILE; return; } close OUTFILE;

Cheers.

Replies are listed 'Best First'.
Re: Text File Section Extractor
by toolic (Bishop) on Apr 10, 2009 at 00:36 UTC
Re: Text File Section Extractor
by telemachus (Friar) on Apr 10, 2009 at 02:05 UTC

    I second toolic's suggestion of range operators as a way to simplify the extraction itself.

    Two other small things: first, check out the 3 argument form of open with lexical filehandles. You get the general benefits of lexical variables, and these filehandles are far easier to pass around. Second, autodie saves you from a lot of boilerplate code with open and close (to name just two):

    use autodie qw/open close/; open my $fh, '<', 'filename'; while (<$fh>) { # Do stuff here } close $fh;
Re: Text File Section Extractor
by jdporter (Canon) on Apr 10, 2009 at 14:26 UTC
    perl -ne "/begin expr/ .. /end expr/ and print" infile > outfile

    It should be pointed out that this will extract as many occurrences of a matching block as there are in the file, not just the first, not just the largest.

Re: Text File Section Extractor
by BJ_Covert_Action (Beadle) on Apr 10, 2009 at 03:13 UTC
    Oh wow, I didn't even realize Range operators could be used for line printing...that makes my life so much easier it's ridiculous. Thank you for pointing that out. I will have to take some time in the next few days to play with those.

    As for the module documentation, and the File Open and File Close shortcuts, I will certainly dive into that documentation when I next get a chance. Thanks for the hints guys, both of you.

    Cheers.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://756715]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2019-12-10 21:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?