Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: File handles - there must be a better way

by stephen (Priest)
on May 13, 2013 at 19:13 UTC ( #1033338=note: print w/replies, xml ) Need Help??

in reply to File handles - there must be a better way

Leaving hundreds of filehandles open is probably a bad idea. I'm assuming that you're leaving them open in order to read them line-by-line. However, there are more scalable ways of doing that.

You can get the current position of the file read buffer with:

my $file_pos = tell($fh);

And you can go to that file position with:

seek($fh, $file_pos, 0);

If you keep track of your position in each file, you can open one file at a time and still read through hundreds of files line-by-line. For example, here's a code snippet that reads through and prints out a cross-section of a bunch of different files, but still only opens one file at a time:

#!env perl use strict; use warnings; our @Files = @ARGV; MAIN: { # A table storing each active filename and its # current position my %file_table = (); # Line number for each file we're reading through # (for printout purposes only) my $line_num = 0; # Set up our file table to point everything to 0 foreach my $file (@Files) { $file_table{$file} = 0; } # Keep printing each line so long as at least one file # has stuff to print while ( scalar keys %file_table ) { # Keep track of line numbers $line_num++; # Open each file, seek to last read position, # read a line, then note the next position foreach my $file ( sort keys %file_table ) { open( my $fh, '<', $file ) or die "Oops! $!"; seek( $fh, $file_table{$file}, 0 ); my $line = <$fh>; print "$file\t$line_num\t$line\n"; if ( eof $fh ) { delete $file_table{$file}; } else { $file_table{$file} = tell($fh); } close($fh); } } print "All done\n"; }


Replies are listed 'Best First'.
Re^2: File handles - there must be a better way
by sundialsvc4 (Abbot) on May 14, 2013 at 02:16 UTC

    Great ideas here, Stephen.   And the same general line of reasoning certainly could be modified in many ways.   For example, one could pre-read and then buffer a few lines from each file, replenishing each buffer on an as-needed basis as the program proceeds.   This would give fairly efficient access to “the next few lines in each file” without too much burden, and it would scale.   You could introduce the concept of “bookmarking” your present position in any given file, then “reading ahead” in search of what you are looking for, knowing that you can “fall back” to the bookmarked point.   And so on.   All of which wizardry can be generally concealed from most of the rest of the programming.

    There are definite limits on the number of file-handles that an operating system can be expected to allow any application to have open at one time, and those limits are often rather small ... in theory and/or in practice.   I tend to design on the assumption of “maybe, a few dozen.”

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1033338]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2020-05-28 22:31 GMT
Find Nodes?
    Voting Booth?
    If programming languages were movie genres, Perl would be:

    Results (166 votes). Check out past polls.