Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Naming file handles with variables?

by sinee (Novice)
on Apr 30, 2009 at 03:25 UTC ( #761028=perlquestion: print w/replies, xml ) Need Help??
sinee has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to open multiple files at once time, to go through each line one by one across every file. That is, read line1 of file1, then line1 of file2 then line1 of file3 (if there's only three files), then repeated again except for line2 of each file, etc... I realize that you can open multiple files at one time simply by doing something like so:
while ($line1 = <FILE1> || $line2 = <FILE2> || $line3 = <FILE3>) { #do stuff with each line }
The problem is, that i don't know how many files will be given to the program and would like to name the file handles as above. Is there a way to use variables in file handles? So it will check how many files are given, then name each handle accordingly? Thanks in advance for any advice you can offer. =)

Replies are listed 'Best First'.
Re: Naming file handles with variables?
by GrandFather (Sage) on Apr 30, 2009 at 04:41 UTC

    You really don't want to be managing an unknown number of file handles 'retail'. Instead think of using a collection of handles. The easiest way is probably an array, although depending on the nature of the rest of the task a hash keyed by file name may be a better choice. Consider:

    use strict; use warnings; my @fileNames = ('file1.txt', 'foo.txt', 'wibble.wav'); my @fileHandles; for my $filename (@fileNames) { open $fileHandles[@fileHandles], '<', $filename or die "Can't open + $filename: $!"; } while (@fileHandles) { for my $file (@fileHandles) { my $line = <$file>; if (! defined $line) { # Hit end of file close $file or die "File close failed: $!"; $file = undef; next; } # do something with $line } @fileHandles = grep {defined} @fileHandles; }

    which opens a bunch of files then enters a loop that reads a line from each file in turn and does something with each line.

    True laziness is hard work
      Should that be [$filename] here?
      for my $filename (@fileNames) { open $fileHandles[$filename], '<', $filename or die "Can't open $f +ilename: $!";

      Quantum Mechanics: The dreams stuff is made of

        I don't thinks so. open $fileHandles[@fileHandles], ... 'creates' a new array element for the new file handle. You could instead:

        open my $newFileHandle, ...; push @fileHandles, $newFileHandle;

        to achieve the same effect.

        If fileHandles were a hash instead of an array then keying by the file name would be appropriate however.

        True laziness is hard work
Re: Naming file handles with variables?
by CountZero (Bishop) on Apr 30, 2009 at 06:42 UTC
    I think this will do what you want. It handles any number of files to be opened and the sub-routine will return at each iteration a reference to an array holding the next line of each of these files. When all files are exhausted it returns undef
    use strict; my @filenames = qw/one.txt two.txt three.txt/; my @filehandles; foreach my $filename (@filenames) { open my $fh, '<', $filename or die "Could not open $filename; $!"; push @filehandles, $fh; } ## end foreach my $filename (@filenames) while (1) { my $lines_ref = read_lines_parallel(@filehandles); last unless $lines_ref; print join '|', @$lines_ref; print '-' x 20, "\n"; } ## end while (1) sub read_lines_parallel { my @filehandles = @_; my @lines; foreach (@filehandles) { push @lines, scalar <$_>; } ## end foreach (@filehandles) if ( join '', @lines ) { return \@lines; } ## end if ( join '', @lines ) else { return undef; } ## end else [ if ( join '', @lines ) } ## end sub read_lines_parallel
    first line file 1 |line 1 file 2 |first line file 3 -------------------- second line file 1 |line 2 file 2 |second line file 3 -------------------- third line file 1 ||third line file 3 -------------------- ||fourth line file 3 --------------------
    Update: I do not know what herbs I put in my tea when I wrote something as ugly as
    while (1) { my $lines_ref = read_lines_parallel(@filehandles); last unless $lines_ref; print join '|', @$lines_ref; print '-' x 20, "\n"; }
    Obviously it should be
    while (my $lines_ref = read_lines_parallel(@filehandles)) { # Do something with $lines_ref or @$lines_ref here print join '|', @$lines_ref; print '-' x 20, "\n"; }


    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Naming file handles with variables?
by roubi (Hermit) on Apr 30, 2009 at 03:38 UTC
    'open' allows you to specify a lexical variable for the filehandle name like such:
    open(my $fh, "file.txt") or die("Can't open file.txt"); while(my $line = <$fh>) { # stuff here } close $fh;

    See Indirect Filehandles
Re: Naming file handles with variables?
by citromatik (Curate) on Apr 30, 2009 at 07:09 UTC

    Another (and less efficient) way, using Tie::File:

    use strict; use warnings; use Tie::File; my @fnames = @ARGV; my @fhandlers; for my $fname (@fnames) { tie my @farr, 'Tie::File', $fname or die $!; push @fhandlers, \@farr; } # Traverse the files: for my $i (0..$#{$fhandlers[0]}){ my @nextvals = map {$_->[$i]} @fhandlers[0..$#fhandlers]; # @nextvals has the next line of each file } untie $_ for (@fhandlers);

    Beware of possible overheads if the files are very big. (0..$#{$fhandlers[0]}) traverses the whole first file just for know how many lines it has.

    Update: Added note about efficiency


Re: Naming file handles with variables?
by whakka (Hermit) on Apr 30, 2009 at 04:08 UTC
    The standard way to do what you want is to open and read one file at a time, keeping whatever data you care about in variables. The example code you gave (logically) does this - it reads every line from FILE1, then FILE2, then FILE3. This is because || is short-circuited: as long as the first condition is true, the whole expression evaluates true. This keeps happening until the end of the first file, etc. (it's safer to check line existence with defined though).

    Instead you could read in from standard input with a simple while ( <> ) { ... } and pipe input to the program from elsewhere. Or you could take a list of filenames as arguments in @ARGV and process them individually:

    for my $file ( @ARGV ) { open my $fh, '<', $file or die "$file: $!"; while ( <$fh> ) { ... } close $fh; }

    Perhaps I should ask: is there any particular reason you need all filehandles open at once?

      If I use,

      while ($line1 = <FILE1> || $line2 = <FILE2> || $line3 = <FILE3>) { #do stuff with each line }

      It says, Can't modify logical or (||) in scalar assignment at line 28, near "<FILE3>) " Execution of aborted due to compilation errors.

      We have to use 'or'

        You can also wrap the assignment in parentheses, but I qualified my statement with "logically" to address what the code was conceptually doing.
Re: Naming file handles with variables?
by lakshmananindia (Chaplain) on Apr 30, 2009 at 03:42 UTC
    ...I realize that you can open multiple files at one time..
    while ($line1 = <FILE1> || $line2 = <FILE2> || $line3 = <FILE3>) { #do stuff with each line }
    The above one will not open multiple file. <FILE1> operator will just read from the filehanle FILE1, which has to be opened already by using open.

    You can say open $fh,file1 and read from that using while(<$fh>)

    --Lakshmanan G.

    The great pleasure in my life is doing what people say you cannot do.

Re: Naming file handles with variables?
by happy.barney (Pilgrim) on Apr 30, 2009 at 09:15 UTC
    use IO::File;
    @handles = grep defined, map {
       new IO::File ($_, 'r')
       || warn ($_, ': cannot open: ', $!)
          && undef
    } @file_names;
    while (@list = grep defined, map scalar <$_>, @handles) {
      for $line (@list) {
        # do stuff with line
Re: Naming file handles with variables?
by koptons (Initiate) on Apr 30, 2009 at 12:49 UTC
    Perl has got a command line option '-n' that supports multiple file processing at the same time. Here $_ will have each line. This option is much faster than using a FileHandle for the cases where the number of files are not known. Look for -n option on command line.
      No, not quite - from perlrun, we see:
      -n causes Perl to assume the following loop around your program, which ma +kes it iterate over filename arguments somewhat like sed -n or awk:.. +.
      Since it [the -n option] causes perl to iterate over the files in turn, self-evidently doesn't meet the requirements of the OP.

      A user level that continues to overstate my experience :-))
Re: Naming file handles with variables?
by NiJo (Friar) on Apr 30, 2009 at 17:30 UTC
    Your problem has been solved even before perl existed. 'paste' combines files in a column by column fashion. I'd use bits and pieces like these:
    my $command = 'paste ' . join(' ', @ARGV); open FH, $command .'|'; while <FH> { @one_line = split "\t", $_; # do something }
    But I suspect that that your application needs something more efficient, as the presented solutions preseted up to now cause many disk seeks. Ignoring caches there is a seek for each line of each file!

    I'd 'slurp' all files into an @array[file_no][line_no] one after another. This is done linear and at once. If the array does not fit into RAM, I'd use Tie::File. See e. g.

      However 'Ignoring caches' and and other real world aspects of programming leads to bad decisions. In this case for example caching means that the line at a time technique is likely to scale very well for extremely large files whereas slurping the files is likely to lead to thrashing the hard disk even for fairly modest (by today's standards) file sizes.

      In general slurping is a bad design choice and for a line by line task such as indicated by the OP Tie::File is likely to be (at best) little more than syntactic sugar, and at worse may impose significantly more overhead than the multiple file handle solutions already offered.

      True laziness is hard work

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://761028]
Approved by lakshmananindia
Front-paged by lakshmananindia
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2017-01-19 02:39 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (167 votes). Check out past polls.