Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Merging files

by sk (Curate)
on Apr 11, 2005 at 07:20 UTC ( #446518=perlquestion: print w/ replies, xml ) Need Help??
sk has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, I am curious to learn how "merging of many files" will be implemented in Perl. Linux command "paste" lets you merge multiple files line by line. How would one achieve something like that in Perl? Is it possible to write a program that does not require multiple file handles for this purpose? I am also very interested in the concept of opening file handles dynamically i.e. number of handles will be determined at run time. Thanks all!

cheers

SK

PS: I am not looking for an implementation, general idea will be very helpful.

Comment on Merging files
Re: Merging files
by borisz (Canon) on Apr 11, 2005 at 07:24 UTC
    Maybe File::IO is what you search for. In general you need one filehandle for every open file.
    Boris
Re: Merging files
by castaway (Parson) on Apr 11, 2005 at 07:33 UTC
    Sort::Merge maybe? (Where you need to have the files already sorted, and it merges the contents into one output).. But you'll still need a filehandle for each file.

    I'm not sure I understand the "open file handles dynamically" question.. Isn't it always dynamic? If you want to open a list of files, just run in a loop:

    foreach my $file (@files) { my $fh; open $fh, "<", $file or die "Can't open $file ($!)"; push @filehandles, $fh; }
    C.
Re: Merging files
by mirod (Canon) on Apr 11, 2005 at 07:49 UTC

    If you want to only use 1 filehandle you can always use a mixture of open/seek/tell/close to always re-use a single filehandle, see code below. Of course this is going to be really slow, as you will have to perform all 4 operations on every single line of each file.

    Having a pool of open file handles would help some: as long as you are merging less files than the size of the pool you merge then the "natural way" (just read one line at a time from each filehandle), and if you need more, then you use the method below for files that you havent in the pool.

    Does it make sense?

    #!/usr/bin/perl -w use strict; use Fatal qw(open close); # so I don't have to bother testing them my @files= @ARGV; my %marker= map { $_ => 0 } @files; while( keys %marker) { foreach my $file (@files) { if( exists $marker{$file}) { open( my $fh, '<', $file); seek( $fh, $marker{$file}, 0); # the 0 means 'set the new +position in bytes to' $marker{$file} if( defined( my $line=<$fh>)) { print $line; $marker{$file}= tell $fh; } else { delete $marker{$file}; } close $fh; } } }
      Wow this is so cool! Thanks Boris/C/Mirod!

      I intially thought I will just learn how to open multiple filehandles but got lured into implementing the paste command like program (not there yet :)).. Here is my code but for some reason my  $line variable does not get set to undef when 'all' filehandles runs out of data...Should i be checking for something else to terminate the loop?

      I guess the code will be much faster if i can generate perl code that puts all the filehandles into one line without looping them through it...

      #! /usr/local/bin/perl -w # Open many file handels # C's code foreach my $file (@ARGV) { my $fh; open $fh, "<", $file or die "Can't open $file ($!)"; push @filehandles, $fh; } while (1) { $line = undef; # Not sure whether this is even required. foreach (@filehandles) { $line .= <$_>; chomp($line); } print ($line,"\n"); last if undef($line); }
        Since the lhs of .= is always a defined value, even if you append undef.
        perl -MData::Dumper -e '$x.= undef; print Dumper($x)' __OUTPUT__ $VAR1 = '';
        Boris

        I think you should keep track of which filehandle is still open, and read it only in that case. And I am always wary of while(1) loops. But its probably just me ;--)

        #!/usr/bin/perl -w use strict; use Fatal qw(open close); my @files= @ARGV; my %fh; # file => file handle foreach my $file (@files) { open( my $fh, '<', $file); $fh{$file}= $fh; } while( keys %fh) { foreach my $file (@files) { if( exists $fh{$file}) { my $fh= $fh{$file}; if( defined( my $line= <$fh>)) { chomp $line; print $line;} # regular line else { delete $fh{$file}; } # eof reached for this file } } print "\n"; # end of line for all files }
Re: Merging files
by BazB (Priest) on Apr 11, 2005 at 12:13 UTC
    File::MergeSort
    (disclaimer: I am the maintainer.)

    If the information in this post is inaccurate, or just plain wrong, don't just downvote - please post explaining what's wrong.
    That way everyone learns.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://446518]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (4)
As of 2014-09-02 03:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (18 votes), past polls