Text File Section Extractor

Hello All,

I've spent the last couple hours developing a small utility for my personal use at home and work. Basically, it is program that asks for a file to read from, a file to write to, and two strings containing patterns. Optionally, the file to be read can be taken as the first command line argument while the file to output to can be taken as the second command line argument. If only one argument is given, it is assumed to be the input file.

Anyways, the program calls a subroutine called pry to move through the input file (file to be read) line by line. It looks for the first pattern specified, and, upon finding it, prints each consecutive line to the output file specified until the second pattern is found.

I intend to upgrade and tweak on this utility as life goes on, but for now it's a decent, bare-bones utility. I am probably going to add the option to add case insensitivity to the start and stop patterns, as well as the option to specify how many sections within a text file the program should extract (for instance, if it should match all groups that it finds, or just some, or just the first, or last, and so on.

I am probably also going to try to turn it into a module as well so that I can just load it into my other programs and pry chunks out of text files as I see fit. Of course, I don't know much about module development yet (only on chapter 7 of the camel book and chapter 7 of Simon Cozens book), so that is for a later time.

I imagine someone probably already has a whole package and/or utility that does this or something similar already. It is probably a lot more professional looking as well. Nonetheless, this is a fun little program I wrote just to see if I could, and to test my skills thus far. If anyone else could/would find it useful, I want them to have access to it as well. I welcome any and all criticisms and thoughts.

Without further ado, I give you the PINSS (PINSS Is Not a Sentient Searcher) program.

#!/usr/bin/perl
# PINSS.plx
# Short for "Pins Is Not a Sentient Searcher"

use strict;
use warnings;

sub pry;
my $file_in;
my $file_out;

print "Please input the phrase or perl regular expression you want to 
+use to begin capturing data: ";
chomp(my $start_exp = <STDIN>);
print "Please input the phase or perl regular expression you want to u
+se to cessate capture of data: ";
chomp(my $stop_exp = <STDIN>);

if(scalar  @ARGV > 1){
    $file_in = shift @ARGV;
    $file_out = shift @ARGV;
    print "\nINPUT FILE: $file_in\n";
    print "OUTPUT FILE: $file_out\n";

    pry($file_in, $file_out, $start_exp, $stop_exp);

    print "\n\nSee $file_out for results\n\n";
}elsif(scalar @ARGV == 1){
    $file_in = shift;
    print "Please specify an output file to print to (type 'screen' to
+ print to terminal screen): ";
    chomp($file_out = <STDIN>);
    print "\nINPUT FILE: $file_in\n";
    print "OUTPUT FILE: $file_out\n";

    pry($file_in, $file_out, $start_exp, $stop_exp);

    print "\n\nSee $file_out for results\n\n";
}else{
    print "Please specify an input file to read from: ";
    chomp($file_in = <STDIN>);
    print "Please specify an output file to print to (type 'screen' to
+ print to terminal screen): ";
    chomp($file_out = <STDIN>);

    print "\nINPUT FILE: $file_in\n";
    print "OUTPUT FILE: $file_out\n";

    pry($file_in, $file_out, $start_exp, $stop_exp);

    print "\n\nSee $file_out for results\n\n";
}


sub pry(){
    my $in_file = shift;
    my $out_file = shift;
    my $start = shift;
    my $stop = shift;

    # print "$in_file\n$out_file\n";

    my $flag;

    open INFILE, $in_file   or die "Cannot open input file to read fro
+m: $!";
    
    if($out_file =~ m/screen/i){
        *OUTFILE = *STDOUT;
    }else{
        open OUTFILE, ">$out_file" or die "Cannot open output file to 
+read from: $!";
    }

    while(my $line = <INFILE>){
        chomp $line;
        next if $line =~ m/^\s*$/;   # Skip all blank and whitespace o
+nly lines

        if ($flag){
            if($line =~ m/$stop/){
                $flag = 0;
                print OUTFILE "CAPTURE ENDED AT: $line\n";
                next;
            }
            print OUTFILE "$line\n";
        }

        if($line =~ m/$start/){
            $flag = 1;
            print OUTFILE "\nCAPTURE STARTED AT: $line\n"
        }

    }

    close INFILE;
    return;
}
close OUTFILE;
[download]

Cheers.

Comment on Text File Section Extractor Download Code

Replies are listed 'Best First'.
Re: Text File Section Extractor by toolic (Bishop) on Apr 10, 2009 at 00:36 UTC
An alternative to your state variable technique is to use Perl's Range Operators. A standard way to process command-line arguments is to use the core module Getopt::Long. And since you offer all these options, a standard way to describe them is to use the core module Pod::Usage. When you're ready to start using modules, the Monastery offers some great reading on the topic: Tutorials -> Creating and Distributing Modules -> Simple Module Tutorial Enjoy.	[reply]
Re: Text File Section Extractor by telemachus (Friar) on Apr 10, 2009 at 02:05 UTC
I second toolic's suggestion of range operators as a way to simplify the extraction itself. Two other small things: first, check out the 3 argument form of open with lexical filehandles. You get the general benefits of lexical variables, and these filehandles are far easier to pass around. Second, autodie saves you from a lot of boilerplate code with `open` and `close` (to name just two): `use autodie qw/open close/; open my $fh, '<', 'filename'; while (<$fh>) { # Do stuff here } close $fh;` [download]	[reply] [d/l] [select]
Re: Text File Section Extractor by jdporter (Paladin) on Apr 10, 2009 at 14:26 UTC
`perl -ne "/begin expr/ .. /end expr/ and print" infile > outfile` [download] It should be pointed out that this will extract as many occurrences of a matching block as there are in the file, not just the first, not just the largest.	[reply] [d/l]
Re: Text File Section Extractor by BJ_Covert_Action (Beadle) on Apr 10, 2009 at 03:13 UTC
Oh wow, I didn't even realize Range operators could be used for line printing...that makes my life so much easier it's ridiculous. Thank you for pointing that out. I will have to take some time in the next few days to play with those. As for the module documentation, and the File Open and File Close shortcuts, I will certainly dive into that documentation when I next get a chance. Thanks for the hints guys, both of you. Cheers.	[reply]

Back to Cool Uses for Perl