Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

parallel reading

by azaria (Beadle)
on May 09, 2006 at 12:11 UTC ( #548184=perlquestion: print w/replies, xml ) Need Help??

azaria has asked for the wisdom of the Perl Monks concerning the following question:

I would like to read in parallel several files and to generate a file which each line is a result of a line conctanation of each one of the files. Lets say:
file A:

file B:

file C:
Then the output file , will contain:
Please advice how can i do it shortly ?

Thanks azaria

Replies are listed 'Best First'.
Re: parallel reading
by Zaxo (Archbishop) on May 09, 2006 at 13:38 UTC

    Here's another,

    open my $out, '>', 'ABC' or die $!; { local $_; open my $A, '<', 'A' or die $!; open my $B, '<', 'B' or die $!; open my $C, '<', 'C' or die $!; no warnings 'uninitialized'; while ($_ = <$A> . <$B> . <$C>) { s/\n//g; print $out $_, "\n"; } } close $out or warn $!;
    That will let the files have different numbers of lines. Memory use is small, and independent of file size.

    Update: Repaired the thinko blazar++ spotted. Empty lines are not a problem - we don't chomp, so they retain newlines until we s/// them gone. I like blazar's extension to different numbers of files.

    After Compline,

      Nice approach. And my be merged with mine, e.g.:

      #!/usr/bin/perl -l use strict; use warnings; my @fh=map { open my $fh, '<', $_ or die "Can't open `$_': $!\n"; $fh } @ARGV; no warnings 'uninitialized'; print while $_=join '', map { chomp(my $line=<$_>); $line } @fh, __END__


      • you should s/undefined/uninitialized/;
      • it may not be fully reliable if empty lines are to be expected in the files.

      Update: the second point was a thinko as Zaxo pointed out.

Re: parallel reading
by blazar (Canon) on May 09, 2006 at 13:49 UTC

    Since others already gave you good general purpose suggestions...

    #!/usr/bin/perl -l use strict; use warnings; my @fh=map { open my $fh, '<', $_ or die "Can't open `$_': $!\n"; $fh } @ARGV; while (@fh) { @fh=grep !eof $_, @fh; print map { chomp(my $line=<$_>); $line } @fh; } __END__

      Very nice (++)! I've never used map before, but that's an eye opener. It's so much better than my (admittedly terrible) hack, and clear to boot. That example is going on my "cheatsheet" of tips I keep pinned to my cube wall.


        Now that you know... beware! It's easy to get addicted to map & grep. They're good for... the jobs they're good for! Do not abuse them!

Re: parallel reading
by roboticus (Chancellor) on May 09, 2006 at 12:40 UTC

    If you're on a *nix box, you could use the paste command, e.g.:

    paste A B C
    But since you asked on perlmonks, you could try something like this (terrible) program:

    #!/usr/bin/perl -w use strict; use warnings; open(A,"<A") or die "Can't open A!"; open(B,"<B") or die "Can't open B!"; open(C,"<C") or die "Can't open C!"; my @a = <A>; my @b = <B>; my @c = <C>; while (1) { my $fl=0; my $aa = shift @a || ""; my $bb = shift @b || ""; my $cc = shift @c || ""; chomp $aa; chomp $bb; chomp $cc; print $aa, $bb, $cc, "\n"; next if $#a + $#b + $#c; last; }
      First thanks for your reply. The example I gave is very shortly. The size of the input files might be change and might be big, so i guess it might influence the memory ? azaria

        In that case I wouldn't slurp in the files all at once. My solution is one that doesn't, copes with a different number of lines per file, and with an arbitrary number of files passed on the cmd line. Shameless self ad terminated! ;-)

Re: parallel reading
by graff (Chancellor) on May 09, 2006 at 12:34 UTC
    Try putting <code> and </code> around your data samples, so that we can see what the data really look like.

    What you want is what the unix "paste" command does. Someone has written a perl version of "paste" already (google for "perl power tools").

    (update: in case you have trouble finding it, here's the source for a perl implementation of paste:

Re: parallel reading
by ashokpj (Hermit) on May 09, 2006 at 13:07 UTC

    Try this

    #!/usr/local/bin/perl open (INFILE1, "/home/ashokpj/merge1.txt") || die ("Cannot open input file merge1\n"); open (INFILE2, "/home/ashokpj/merge2.txt") || die ("Cannot open input file merge2\n"); open (INFILE3, "/home/ashokpj/merge3.txt") || die ("Cannot open input file merge2\n"); chomp($line1 = <INFILE1>); chomp($line2 = <INFILE2>); chomp($line3 = <INFILE3>); while ($line1 ne "" || $line2 ne "" || $line3 ne "" ) { print $line1.$line2.$line3."\n"; if ($line1 ne "") { chomp($line1 = <INFILE1>); } if ($line2 ne "") { chomp($line2 = <INFILE2>); } if ($line3 ne "") { chomp($line3 = <INFILE3>); } } close(INFILE1); close(INFILE2); close(INFILE3);
Re: parallel reading
by McDarren (Abbot) on May 09, 2006 at 13:32 UTC
    If we can make the assumption that each file has the same number of lines, then the following should work:
    #!/usr/bin/perl -w use strict; my %files; my @infiles = qw(fileA fileB fileC); for (@infiles) { open IN, "<", $_ or die "Cannot open $_:$!\n"; chomp(@{$files{$_}} = <IN>); close IN; } open OUT, ">", "fileD" or die "Cannot open fileD:$!\n"; for my $line (0 .. $#{$files{fileA}}) { for my $file (@infiles) { print OUT $files{$file}[$line]; } print OUT "\n"; } close OUT;
    $ cat fileD 111AAAaaa 222BBBbbb 333CCCccc
    Darren :)
Re: parallel reading
by wfsp (Abbot) on May 09, 2006 at 12:38 UTC
    Hi azaria!

    Please advice how can i do it shortly?
    The very short answer is: write some code. :-)

    I would guess you need to open 3 files for input and 1 for output. Assuming fairly small input files, read the input into arrays, loop over them and build your output. Save your output to a file.

    Try it and let us know how you get on.

Re: parallel reading
by smokemachine (Hermit) on May 10, 2006 at 02:51 UTC
    Can be this?
    perl -e 'for(@ARGV){open FILE,$_;chomp($a[$.-1].=$_)while<FILE>;close +FILE}$,=$/;open FILE,">out";print FILE@a' A B C
Re: parallel reading
by whyxys (Initiate) on May 11, 2006 at 01:52 UTC
    JAPH(just another perl approach),hehe:
    perl -e 'map{chomp;$a[$i<3?$i:($i=0)].=$_;$i++}<>;print"@a\n";' filea +fileb filec
    assumption that each file has the same number of lines, here line=3 for ease

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://548184]
Approved by Corion
Front-paged by blazar
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2023-12-09 04:55 GMT
Find Nodes?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?

    Results (37 votes). Check out past polls.