Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Open multiple file handles?

by onlyIDleft (Acolyte)
on May 07, 2011 at 02:50 UTC ( #903477=perlquestion: print w/ replies, xml ) Need Help??
onlyIDleft has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, can anyone help me with how to input multiple files?

Essentially, I want my script to open and close as many file handles as INPUT as the number of ARGV elements from command line, but without a priori knowledge of this number.

Basically I have a list of names and their number of occurrences in each input file.I then need to make a union of all these lists, and then compare this BIG list to each of the smaller lists and output a matrix of this comparison.

For the reason of needing to make these comparisons I cannot open and close the filehandle for INPUT in a while loop, because I can concatenate only after reading in ALL input files!

Once again, my input could very well be 500 files or 2 files, or 10000 files...How do I OPEN and CLOSE as many file handles as the number of arguments at command line without a priori knowledge of this?

For example, if I have 3 input files as follows:

FILE 1

a 1 b 2 c 3

FILE 2

a 5 d 3

FILE 3

a 1 x 4

Then my final output should look like

NAME FILE1 FILE2 FILE3

a 1 5 1

b 2 0 0

c 3 0 0

d 0 3 0

x 0 0 4

Thanks and have a great weekend.

Dear all,

Thanks to you for all your suggestions. I've started teaching myself PERL since only 3 weeks, and had never come across the need to process multiple input files, hence my continuing confusion. Based on your responses, am I right in understanding the following?

1. I do NOT have to open as many file handles as the number of ARGV elements from command line? Just one file handle is sufficient, and to use it in a while loop to point to each of the input files in turn, yeah?

2. I can build a merge of the lists in each input file into a NEW list held in memory and not an actual list or file that is being written into, correct? This seemed intuitive even to me, but am I right?

3. It is possible to compare this merge list in memory to the individual file lists from input? <\p>

Except I still DO NOT understand how to do this comparison between MERGE list in memory versus individual lists form each input file, especially when opening multiple files with ONLY 1 common file handle!

I am a little lost here because of this...My understanding is that the final merge list can be compiled only after reading the contents of the last input file, at which time the common file handle is pointing to the last file. So how can a comparison of the merge list to ANYTHING but the last file be made?

I know I am being naive, but I am sure Your Holiness' the Monks will be kind to a new initiate :)

Thanks to all those who wrote down partial and even FULL scripts, it is very sweet and kind of you. Since I am learning and very much a newbie, I think I would appreciate it and perhaps benefit much more if you could point out an outline of your algorithm and refer me to the operations/syntax that I should teach myself, so that I may implement the most suitable algorithm for my work

Also, my example files and file names have been misleading: my input files are not numbered, they have unrelated alphabetical names. And each of the files is not an array, it is more in the format of a 2 column Excel sheet, but as txt file. Entries in each line separated by tab, lines themselves separated by newline

Thanks again, I aspire to be as helpful and patient as all of you, one day! :)

Comment on Open multiple file handles?
Re: Open multiple file handles?
by Anonymous Monk on May 07, 2011 at 02:54 UTC
Re: Open multiple file handles?
by davido (Archbishop) on May 07, 2011 at 02:57 UTC

    Perl has an operator that does what you're asking for. It's the diamond operator, and when used 'empty', it reads each item listed in @ARGV sequentially. See perlop. Here's a snippet from the documentation:

    The null filehandle <> is special: it can be used to emulate the behavior of sed and awk. Input from <> comes either from standard input, or from each file listed on the command line. Here's how it works: the first time <> is evaluated, the @ARGV array is checked, and if it is empty, $ARGV[0] is set to "-", which when opened gives you standard input. The @ARGV array is then processed as a list of filenames. The loop

    while (<>) { ... # code for each line }

    is equivalent to the following Perl-like pseudo code:

    unshift(@ARGV, '-') unless @ARGV; while ($ARGV = shift) { open(ARGV, $ARGV); while (<ARGV>) { ... # code for each line } }

    Dave

      Thanks for your reply

      Yes, I am familiar with using the <> operator, but my question is more of

      How do I OPEN as many file handles as the number of ARGV elements without a priori knowledge of this number?

        Now you're presenting an X-Y problem: You have something you would like to accomplish, and you think that you have to solve how to open an unknown number of file handles up to possibly 10,000 to accomplish your primary objective. That's not what you need to solve. X is accomplish the objective. Y is how to get there. You're focusing on the wrong 'Y'. The diamond operator will automatically open one file after the other, sequentially, as you read from it. When one file finishes, it moves on to the next.

        If your concern is knowing the filenames ahead of time, they're held in @ARGV before you start reading, and as you start reading, $ARGV will hold the name of the current file you're iterating over. $. will hold the current line of the file you're iterating over. So if you need to keep track of where you are in the process, you can rely on those special variables as hints.

        So to reiterate, assuming there are no other command line arguments aside from simple filenames:

        • @ARGV will hold the list of filenames.
        • scalar(@ARGV) will tell you how many items @ARGV holds.
        • $ARGV will tell you which file the diamond operator is reading currently.
        • $. will tell you what line number you're on in that file.
        • <> will read from the first file in @ARGV, followed by the 2nd, and so on until there's nothing more to read.
        • A simple while( <> ) {.... } loop will "do the right thing" if the right thing is reading every line of every file on the command line.

        Dave

        How does that follow from your question? You have not shown that you need to open a separate file handle (at the same time) for everything in ARGV. We have pointed out that the diamond operator opens each one in turn, automatically. From the problem you presented, there is no reason to open the files all at the same time, and even if you did so, what would you do with them all?

        But a direct answer to that question is: call open for each one. What's the big deal? Call open in a loop, or get fancier with a map statement.

        Do you realize that you can use a variable as the file handle, rather than an old-fashioned GLOB thing? Maybe that's the part you are missing. I hesitate to show a complete example because you should not need to do that at all.

        Just in case someone stumbles on this who actually does for whatever reason need to dynamically open x number of file handles where the programmer has no way to predict what x will be....

        It is easy. You use a scalar in your open i.e. open($fh, then just stick it onto an array or hash.

Re: Open multiple file handles?
by John M. Dlugosz (Monsignor) on May 07, 2011 at 04:05 UTC
    As davido indicates, you don't need to know which file you are reading from, just read all the content from all the files. So the built-in <> operator will do the trick. You can copy @ARGV to another array or produce the first line ahead of time, before consuming the arguments.

    Use a hash to hold the data you are building. From the example, you read (key,value) pairs from the input. Then you output, for each key, all the values it had been used with. after reading $key and $value, do a push @{$concordance{$key}},$value;. Then when you are all done, something like

    while (my ($key, $vals)=each %concordance) { say $key, ' ', join(' ', @$vals); }
Re: Open multiple file handles?
by GrandFather (Cardinal) on May 07, 2011 at 06:53 UTC

    As suggested in the preceding replies, you've grabbed the wrong end of the stick and run with it. A better solution (for modest size data sets anyway) than the one you are struggling to implement is to read the files one at a time and merge the results in memory. Consider:

    #!/usr/bin/perl use warnings; use strict; # First set up the sample files my @fileContents = ('a 1 b 2 c 3', 'a 5 d 3', 'a 1 x 4'); @ARGV = (); for (my $fileNum = 1; @fileContents; ++$fileNum) { my $fileName = "file$fileNum.txt"; open my $fileOut, '>', $fileName or die "Failed to create $fileNam +e: $!\n"; push @ARGV, $fileName; print {$fileOut} shift @fileContents; } # Now for the "real" code my %data; my $maxFile = @ARGV - 1; while (<>) { my %newData = split; $data{$_}[$maxFile - @ARGV] = $newData{$_} for keys %newData; } for my $key (sort keys %data) { $data{$key}[$maxFile] ||= 0; $_ ||= 0 for @{$data{$key}}; print "$key @{$data{$key}}\n"; }

    Prints:

    a 1 5 1 b 2 0 0 c 3 0 0 d 0 3 0 x 0 0 4

    Note that most of the "tricky" code is to deal with getting the output data in the required format accounting for "missing" elements.

    True laziness is hard work
      Dear all,

      Thanks to you for all your suggestions. I've started teaching myself PERL since 3 weeks, and had never come across the need to process multiple input files. Based on your responses, am I right in understanding the following?

      1. I do NOT have to open as many file handles as the number of ARGV elements from command line?

      2. I can build a merge of the lists in each input file into one held in memory and not an actual file

      3. It is possible to compare this merge list in memory to the individual file lists from input? <\p>

      Except I do not still understand how to do this comparison when opening multiple files with ONLY 1 file handle! I am a little lost here...My understanding is that the final merge list can be compiled only after reading the contents of the last input file, at which time the common file handle is pointing to the last file. So how can a comparison of the merge list to ANYTHING but the last file be made?

      I know I am being naive, but I am sure Your Holiness' the Monks will be kind to a new initiate :)

      Thanks to all those who wrote down partial and even FULL scripts, it is very sweet and kind of you. Since I am learning and very much a newbie, I would appreciate and benefit much more if you could point out an outline of logic and refer me to the operations/syntax that I should teach mysef to implement the most suitable algo for my work

      Also, my example might have been misleading, my input files are not numbered, they have unrelated alphabetical names. And each of the files is not an array, it is more in the format of a 2 column Excel sheet, but as txt file. Entries in each line separated by tab, lines themselves separated by newline

      Thanks again, I aspire to be as helpful and patient as all of you, one day! :)

        I know I am being naive

        Naivety we are fine with. Foremost this is a place of learning and naivety is to be expected. What makes us grumpy or inclined to ignore supplicants is when we offer advice that seems to be ignored. In this case you have had a number of replies that each address points 1 - 3. Go back and re-read them, and remember that what we tell you three times is true!

        If you have specific questions about some code you have been offered ask them in a reply to the node the code was provided to. That makes it easier for people who come afterwards to follow the different threads of "conversation".

        Oh, and if you simplify your problem to the point of irrelevance, the answers you get will be irrelevant too!

        True laziness is hard work
Re: Open multiple file handles?
by Anonymous Monk on May 07, 2011 at 07:46 UTC
    I doubt this will help, but here you go, it was a fun diversion :)
    #!/usr/bin/perl -- use strict; use warnings; use Path::Class qw[ file dir ]; use autodie; # does error checking on open/close... use DBI; use DBD::SQLite; Main( @ARGV ); exit( 0 ); sub Main { Demo(); } ## end sub Main sub CreateSampleInput { # creates sample data files as input.0.txt... my @sample = ( "a\t1\nb\t2\nc\t3\n", "a\t5\nd\t3\n", "a\t1\nx\t4\n", ); for my $ix ( 0 .. 2 ){ open my($fh), '>', "input.$ix.txt"; print $fh $sample[$ix]; close $fh; } } ## end sub CreateSampleInput sub Demo { chdir file(__FILE__)->absolute->dir; # cd to directory of this fil +e CreateSampleInput(); my $dbh = OpenInitDb(); FilesIntoDb( $dbh, glob 'input.*.txt' ); PrintDbReport( $dbh ); CleanDb( $dbh ); } ## end sub Demo sub FilesIntoDb { # iterates over a list of file and imports into data +base my( $dbh, @files ) = @_; for my $file ( @files ){ eval { ReadFileIntoDb( $dbh, $file ); } or warn $@; } } ## end sub FilesIntoDb sub CleanDb { # closes database handle my( $dbh ) = @_; $dbh->disconnect; unlink 'temp.test.sqlite'; # deletes temporary database file } ## end sub CleanDb sub OpenInitDb { ## creates temporary sqlite database in current direc +tory my $dbh = DBI->connect( 'dbi:SQLite:dbname=temp.test.sqlite', undef, undef, { RaiseError => 1, PrintError => 1, }, ); eval { $dbh->do(' CREATE TABLE fileKeyValue ( file TEST NOT NULL, key TEXT NOT NULL, value TEXT ); '); } or warn $@; return $dbh; } ## end sub OpenInitDb sub ReadFileIntoDb { # reads key/value pairs from file into database my( $dbh, $file ) = @_; open my($fh), '<', $file; my $sth = $dbh->prepare_cached(' INSERT INTO fileKeyValue ( file, key, value ) VALUES ( ?, ?, ?) ' ); while(defined( my $line = <$fh> )){ chomp $line; my( $key, $value ) = split /\s+/, $line, 2; $sth->execute($file, $key, $value ); } close $fh; } ## end sub ReadFileIntoDb sub PrintNice { # kills evil flesh eating zombies with STDOUT awesomen +ess #~ print join " ", @_, "\n"; my $template = join ' ', map{'%-15s'} @_; printf "$template\n", @_; } ## end sub PrintNice sub PrintDbReport { my( $dbh ) = @_; #~ http://www.w3schools.com/sql/sql_distinct.asp #~ http://search.cpan.org/perldoc?DBI#selectall_arrayref my @files = map {@$_} @{ $dbh->selectall_arrayref('SELECT DISTINCT file FROM fileKeyV +alue') }; PrintNice( "NAME", @files ); my $sth = $dbh->prepare_cached(' SELECT file, key, value FROM fileKeyValue ORDER BY key ' ); $sth->execute; #~ http://search.cpan.org/perldoc?DBI#bind_columns # each fetchrow puts new values into these variables my( $file, $key, $value ); $sth->bind_columns( \( $file, $key, $value ) ); my $prevKey = ""; my %FileValue; # temporaryily associate filenames with values while( $sth->fetchrow_arrayref ){ #~ warn " file($file) prevKey($prevKey) key($key) value($value) "; if( $key ne $prevKey ){ if( %FileValue ){ PrintNice( $prevKey, map { $FileValue{$_} } @files ); } %FileValue=map { $_ => 0 } @files; # init #~ @FileValue{@files} = ( 0 )x@files; #init, same } $prevKey = $key; $FileValue{$file} = $value; } if( %FileValue ){ # print anything we haven't printed yet PrintNice( $key, map { $FileValue{$_} } @files ); # empty FileValue, redundant since its the end of sub PrintDbR +eport undef %FileValue; } $sth->finish; } ## end sub PrintDbReport __END__ NAME input.0.txt input.1.txt input.2.txt a 1 5 1 b 2 0 0 c 3 0 0 d 0 3 0 x 0 0 4

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://903477]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (8)
As of 2014-12-25 07:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls