Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

combine multiple files into one (line by line)

by Anonymous Monk
on Apr 05, 2001 at 22:23 UTC ( #70206=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello

I have at least 3 files I need to combine into one. However, I want to take the first line of each file and print it to the new file, then the second line, and so on.

I wrote a script to do this however the output is incorrect. Instead of getting something like this:

file1::line1 file2::line1 file3::line1 file1::line2 file2::line2 file3::line2 ...
I get this:
file1::line12 file2::line12 file3::line12 file1::line1 file2::line1 file3::line1 file1::line1 file2::line1 file3::line1 ...
Please Help Me,

Thanx,

Roger

#!/usr/bin/perl -w # These files have this format: # filename::line number (e.g., 1.txt::1) my @files = ( "1.txt", "2.txt", "3.txt", ); my $total_strings = 9; my $cur_pos = 0; my $next_pos = 0; open(MIXED, ">mixed.txt") || die "(x) Failed: mixed.txt - $!\n"; for (my $cur_string=1; $cur_string <= $total_strings; $cur_string++) + { print "\nCurrent String: $cur_string of $total_strings\n"; foreach my $file (@files) { print " Opening File: $file\n"; print " Current Position is: $cur_pos\n"; open(FILE, $file) || die "(x) Failed: $file - $!\n"; while(<FILE>) { seek(FILE, $cur_pos, 1); $string = $_; $next_pos = tell(FILE); } close(FILE); print " Read string: $string"; print MIXED $string; print " Next Position is: $next_pos\n\n"; } $cur_pos = $next_pos; } close(MIXED);

Comment on combine multiple files into one (line by line)
Select or Download Code
Re: combine multiple files into one (line by line)
by thabenksta (Pilgrim) on Apr 05, 2001 at 22:39 UTC

    try this:

    #!/usr/bin/perl -w # These files have this format: # filename::line number (e.g., 1.txt::1) my @files = ( "1.txt", "2.txt", "3.txt", ); my $total_strings = 9; my $cur_pos = 0; my $next_pos = 0; open(MIXED, ">mixed.txt") || die "(x) Failed: mixed.txt - $!\n"; for (my $cur_string=1; $cur_string <= $total_strings; $cur_string++) + { print "\nCurrent String: $cur_string of $total_strings\n"; foreach my $file (@files) { print " Opening File: $file\n"; print " Current Position is: $cur_pos\n"; open(FILE, $file) || die "(x) Failed: $file - $!\n"; @file = <FILE>; $string = $file[$cur_pos]; close(FILE); print " Read string: $string"; #print MIXED $string; print " Next Position is: $next_pos\n\n"; } $next_pos = $cur_pos + 1; $cur_pos = $next_pos; } close(MIXED);
    </code> my $name = 'Ben Kittrell'; $name=~s/^(.+)\s(.).+$/\L$1$2/g; my $nick = 'tha' . $name . 'sta';
Re: combine multiple files into one (line by line)
by ton (Friar) on Apr 05, 2001 at 22:39 UTC
    Why are you openning and closing the same files over and over? This is expensive and error-prone. Just do something like this:
    my @files = ("1.txt", "2.txt", "3.txt"); my @fhs; foreach my $file (@files) { open($fh, $file) || die; push(@fhs, $fh); } open(MIXED, "mixed.txt") || die; while (1) { foreach $fh (@fhs) { $line = <$fh>; last if (eof($fh)); print MIXED $line; } } map { close($_); } @fhs; close(MIXED);
    Note that this code assumes all your files have the same number of lines. If this is not the case, you have to do some extra trickery to not break out of the loop until all files are empty...
      Thank you for your help, however I failed to mention that the files range in size from 55MB to 130MB. In addition, I only want to grab the first 2,000 or so lines from each file and distribute them in the abovementioned fashion.

      For these reasons I don't think that using arrays is feasible. I understand that opening and closing is expensive, but hopefully less so than using huge arrays.

      Thanks for your help,

      Roger

      p.s. thabenksta's code works, but yours does not.

        I don't think he was intending to load the entire file into memory. The trick is to keep an array of open FILEHANDLES and read one line at a time from each...
        #!/usr/bin/perl -w use strict; use IO::File; #interleave files, first argument is line count, rest are files. my $lc = shift; my @fhs; foreach (@ARGV) { my $fh = new IO::File; open ($fh, "<$_"); push @fhs, $fh; } while (--$lc) { foreach (@fhs) { print scalar(<$_>); } }

        Tested minimally but the path is clear from here I hope.

        --
        $you = new YOU;
        honk() if $you->love(perl)

        You really don't need to read all files in. For clarity I will use filehandles.
        use FileHandle; my @files = qw( 1.txt 2.txt 3.txt); my @handles = map{ $_ = new FileHandle $_, 'r'; die "Could not open $_: $!" unless defined($_); $_; } @files; open MIXED, ">$mixed_file" or die "Could not open $mixed_file: $!"; while( @handles ){ foreach my $fh (@handles){ my $line = $fh->getline or undef $fh; print MIXED $line if defined $fh; } @handles = grep defined, @handles; } close MIXED;
        Hope this helps,

        Jeroen
        "We are not alone"(FZ)

        Whoops, that's what happens when you submit code without testing it first :). There were four things wrong with my code:
        1. The open call for the MIXED file did not have a '>'. This is sheer carelessness.
        2. I was openning the files to the same typeglob, which resulted in three references to the same open file being stored in the array. This is a more subtle bug.
        3. I was testing for eof after reading in a line, instead of before. This would result in the last line never being written to the mixed file.
        4. Finally, I was using 'last' within the inner loop, foolishly thinking it would get me out of the outer loop. Another dumb mistake.
        Working code (at least on my Wintel box) follows:
        my @files = ("1.txt", "2.txt", "3.txt"); my @fhs; for ($i = 0; $i < scalar(@files); ++$i) { open($fhs[$i], $files[$i]) || die; } open(MIXED, ">mixed.txt") || die; OUTER: while (1) { foreach $fh (@fhs) { last OUTER if (eof($fh)); $line = <$fh>; print MIXED $line; } } map { close($_); } @fhs; close(MIXED);
        Let this be a lesson for me in preliminary testing!

        -ton

Re: combine multiple files into one (line by line)
by danger (Priest) on Apr 05, 2001 at 23:31 UTC

    Here are two versions: the first one intermixes all the lines from the files, the second allows you to decide on a limit of how many lines to read from the files (as per your followup specifications):

    #!/usr/bin/perl -w use strict; use IO::File; my @files = qw/file1 file2 file3/; my @fhs = map{IO::File->new($_)||die "$_: $!"} @files; print while $_ = join '', map{scalar <$_>||''} @fhs; __END__ #!/usr/bin/perl -w use strict; use IO::File; my $limit = 2000; my @files = qw/file1 file2 file3/; my @fhs = map{IO::File->new($_)||die "$_: $!"} @files; for(1 .. $limit) { $_ = join '', map{scalar <$_>||''} @fhs or last; print; } __END__

    Both assume that you'll just redirect output to the new file, but you can open() an output filehandle and print to it if you wish.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://70206]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (15)
As of 2014-09-22 20:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (198 votes), past polls