http://www.perlmonks.org?node_id=1017104

newbie1991 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Perl Monks! I have a two part question: ONE : I'm writing a program where it has to scan for all files of a particular extension and continue working with them. For example, all *.pep files are to be processed in some way. Since the file is pretty large, I'd like the piped input to be read into an array and not a scalar. Initially I was using the filename from user input, like so :

print "Enter the name of the files containing all the proteomic inform +ation: \n"; my $filename = <STDIN>; unless( $filename =~ /\.pep?$/) #Checking if the file extension is .pep { die "File is not .pep file. Exiting. \n"; } open (DATA, $filename) or die "Cannot open $filename!\n"; my @arraydata = <DATA>; #all the information in the pep file is input into an array

How do I convert this to using pipes? Is it simply:

open(DATA, '| >*.pep');

TWO I want to use the output from this program as input for another program. Can this be done by just modifying the data input 'open' line in the other program? This is my first program using pipes so I'd appreciate your help. Thanks :)

Replies are listed 'Best First'.
Re: Using piped I/O?
by daxim (Curate) on Feb 05, 2013 at 11:56 UTC

    It is almost always a mistake to implement some directory searching on your own. Great directory searching tools already exist, e.g. ack and find, and your built-in one will be buggy, feature poor and barely usable.

    ack -ag .pep$ | xargs your-program find -iname *.pep | xargs your-program

    Seperate your concerns. Just write your program to deal with file names as command-line arguments, see @ARGV, and leave the searching to someone else.

    You do not need pipes (in your program). You do not need arrays. Since you say the files are pretty large, go over them line-by-line to save on memory. Stuffing a whole file into an array would be harmful because you occupy as much memory as the file is large.

    use 5.010; use autodie qw(:all); for my $file (@ARGV) { say "Now processing file '$file'"; open my $handle, '<:raw', $file; while (my $line = readline $handle) { # do something with the line here. } close $handle; }

    Likely you want to include Getopt::Long to process additional command-line arguments, and Pod::Usage (see chapter "Recommended Use") to write some nice documentation.$file

Re: Using piped I/O?
by BrowserUk (Patriarch) on Feb 05, 2013 at 12:31 UTC
    Since the file is pretty large, I'd like the piped input to be read into an array and not a scalar.

    A 1GB, 16e6 line file loaded into a scalar requires 1GB + ~48 bytes and loads in a couple of seconds.

    That same file loaded into an array requires just under 2GB (and uses over 4.5GB in the process of building it!); and takes much longer to load. This is because you still have to load the 1GB of data, but you also have to allocate memory for the 16e6 scalars; and the array to hold them; and the intermediate arrays that are disgarded during the process.

    If you are trying to conserve memory, don't load it into an array and then process it line by line. Just process each line as you read it, and then discard it before reading the next one. Reading and processing line-by-line this way uses a few kB only.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Using piped I/O?
by Anonymous Monk on Feb 05, 2013 at 10:34 UTC
Re: Using piped I/O?
by jwkrahn (Abbot) on Feb 05, 2013 at 10:38 UTC
    unless( $filename =~ /\.pep?$/) #Checking if the file extension is .pep

    Actually, that checks if the file extention is ".pe" OR ".pep" OR ".pe\n" OR ".pep\n".