Re: File handles - there must be a better way

in reply to File handles - there must be a better way

Leaving hundreds of filehandles open is probably a bad idea. I'm assuming that you're leaving them open in order to read them line-by-line. However, there are more scalable ways of doing that.

You can get the current position of the file read buffer with:

my $file_pos = tell($fh);
[download]

And you can go to that file position with:

seek($fh, $file_pos, 0);
[download]

If you keep track of your position in each file, you can open one file at a time and still read through hundreds of files line-by-line. For example, here's a code snippet that reads through and prints out a cross-section of a bunch of different files, but still only opens one file at a time:


#!env perl

use strict;
use warnings;

our @Files = @ARGV;

MAIN: {

    # A table storing each active filename and its
    # current position
    my %file_table = ();

    # Line number for each file we're reading through
    # (for printout purposes only)
    my $line_num = 0;

    # Set up our file table to point everything to 0
    foreach my $file (@Files) {
        $file_table{$file} = 0;
    }

    # Keep printing each line so long as at least one file
    # has stuff to print
    while ( scalar keys %file_table ) {

        # Keep track of line numbers
        $line_num++;

        # Open each file, seek to last read position,
        # read a line, then note the next position
        foreach my $file ( sort keys %file_table ) {
            open( my $fh, '<', $file ) or die "Oops! $!";
            seek( $fh, $file_table{$file}, 0 );
            my $line = <$fh>;
            print "$file\t$line_num\t$line\n";
            if ( eof $fh ) {
                delete $file_table{$file};
            } else {
                $file_table{$file} = tell($fh);
            }
            close($fh);
        }

    }

    print "All done\n";

}
[download]

stephen

Comment on Re: File handles - there must be a better way Select or Download Code

Replies are listed 'Best First'.
Re^2: File handles - there must be a better way by sundialsvc4 (Abbot) on May 14, 2013 at 02:16 UTC
Great ideas here, Stephen. And the same general line of reasoning certainly could be modified in many ways. For example, one could pre-read and then buffer a few lines from each file, replenishing each buffer on an as-needed basis as the program proceeds. This would give fairly efficient access to “the next few lines in each file” without too much burden, and it would scale. You could introduce the concept of “bookmarking” your present position in any given file, then “reading ahead” in search of what you are looking for, knowing that you can “fall back” to the bookmarked point. And so on. All of which wizardry can be generally concealed from most of the rest of the programming. There are definite limits on the number of file-handles that an operating system can be expected to allow any application to have open at one time, and those limits are often rather small ... in theory and/or in practice. I tend to design on the assumption of “maybe, a few dozen.”

Replies are listed 'Best First'.

Re^2: File handles - there must be a better way
by sundialsvc4 (Abbot) on May 14, 2013 at 02:16 UTC

Great ideas here, Stephen. And the same general line of reasoning certainly could be modified in many ways. For example, one could pre-read and then buffer a few lines from each file, replenishing each buffer on an as-needed basis as the program proceeds. This would give fairly efficient access to “the next few lines in each file” without too much burden, and it would scale. You could introduce the concept of “bookmarking” your present position in any given file, then “reading ahead” in search of what you are looking for, knowing that you can “fall back” to the bookmarked point. And so on. All of which wizardry can be generally concealed from most of the rest of the programming.

There are definite limits on the number of file-handles that an operating system can be expected to allow any application to have open at one time, and those limits are often rather small ... in theory and/or in practice. I tend to design on the assumption of “maybe, a few dozen.”

In Section Seekers of Perl Wisdom